Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Studentized residual

From Wikipedia, the free encyclopedia
Kind of ratio
For broader coverage of this topic, seeStudentization.
This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)
icon
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Studentized residual" – news ·newspapers ·books ·scholar ·JSTOR
(May 2015) (Learn how and when to remove this message)
This article'sfactual accuracy isdisputed. Relevant discussion may be found on thetalk page. Please help to ensure that disputed statements arereliably sourced.(February 2014) (Learn how and when to remove this message)
(Learn how and when to remove this message)
Part of a series on
Regression analysis
Models
Estimation
Background

Instatistics, astudentized residual is thedimensionless ratio resulting from the division of aresidual by anestimate of itsstandard deviation, both expressed in the sameunits. It is a form of aStudent'st-statistic, with the estimate of error varying between points.

This is an important technique in the detection ofoutliers. It is among several named in honor ofWilliam Sealey Gosset, who wrote under the pseudonym "Student" (e.g.,Student's distribution). Dividing a statistic by asample standard deviation is calledstudentizing, in analogy withstandardizing andnormalizing.

Motivation

[edit]
See also:Errors and residuals in statistics

The key reason for studentizing is that, inregression analysis of amultivariate distribution, the variances of theresiduals at different input variable values may differ, even if the variances of theerrors at these different input variable values are equal. The issue is the difference betweenerrors and residuals in statistics, particularly the behavior of residuals in regressions.

Consider thesimple linear regression model

Y=α0+α1X+ε.{\displaystyle Y=\alpha _{0}+\alpha _{1}X+\varepsilon .\,}

Given a random sample (XiYi),i = 1, ..., n, each pair (XiYi) satisfies

Yi=α0+α1Xi+εi,{\displaystyle Y_{i}=\alpha _{0}+\alpha _{1}X_{i}+\varepsilon _{i},\,}

where theerrorsεi{\displaystyle \varepsilon _{i}}, areindependent and all have the same varianceσ2{\displaystyle \sigma ^{2}}. Theresiduals are not the true errors, butestimates, based on the observable data. When themethod of least squares is used to estimateα0{\displaystyle \alpha _{0}} andα1{\displaystyle \alpha _{1}}, then the residualsε^{\displaystyle {\widehat {\varepsilon \,}}}, unlike the errorsε{\displaystyle \varepsilon }, cannot be independent since they satisfy the two constraints

i=1nε^i=0{\displaystyle \sum _{i=1}^{n}{\widehat {\varepsilon \,}}_{i}=0}

and

i=1nε^ixi=0.{\displaystyle \sum _{i=1}^{n}{\widehat {\varepsilon \,}}_{i}x_{i}=0.}

(Hereεi is theith error, andε^i{\displaystyle {\widehat {\varepsilon \,}}_{i}} is theith residual.)

The residuals, unlike the errors,do not all have the same variance: the variance decreases as the correspondingx-value gets farther from the averagex-value. This is not a feature of the data itself, but of the regression better fitting values at the ends of the domain. It is also reflected in theinfluence functions of various data points on theregression coefficients: endpoints have more influence. This can also be seen because the residuals at endpoints depend greatly on the slope of a fitted line, while the residuals at the middle are relatively insensitive to the slope. The fact thatthe variances of the residuals differ, even thoughthe variances of the true errors are all equal to each other, is theprincipal reason for the need for studentization.

It is not simply a matter of the population parameters (mean and standard deviation) being unknown – it is thatregressions yielddifferent residual distributions atdifferent data points, unlikepointestimators ofunivariate distributions, which share acommon distribution for residuals.

Background

[edit]

For this simple model, thedesign matrix is

X=[1x11xn]{\displaystyle X=\left[{\begin{matrix}1&x_{1}\\\vdots &\vdots \\1&x_{n}\end{matrix}}\right]}

and thehat matrixH is the matrix of theorthogonal projection onto the column space of the design matrix:

H=X(XTX)1XT.{\displaystyle H=X(X^{T}X)^{-1}X^{T}.\,}

Theleveragehii is theith diagonal entry in the hat matrix. The variance of theith residual is

var(ε^i)=σ2(1hii).{\displaystyle \operatorname {var} ({\widehat {\varepsilon \,}}_{i})=\sigma ^{2}(1-h_{ii}).}

In case the design matrixX has only two columns (as in the example above), this is equal to

var(ε^i)=σ2(11n(xix¯)2j=1n(xjx¯)2).{\displaystyle \operatorname {var} ({\widehat {\varepsilon \,}}_{i})=\sigma ^{2}\left(1-{\frac {1}{n}}-{\frac {(x_{i}-{\bar {x}})^{2}}{\sum _{j=1}^{n}(x_{j}-{\bar {x}})^{2}}}\right).}

In the case of anarithmetic mean, the design matrixX has only one column (avector of ones), and this is simply:

var(ε^i)=σ2(11n).{\displaystyle \operatorname {var} ({\widehat {\varepsilon \,}}_{i})=\sigma ^{2}\left(1-{\frac {1}{n}}\right).}

Calculation

[edit]

Given the definitions above, theStudentized residual is then

ti=ε^iσ^1hii {\displaystyle t_{i}={{\widehat {\varepsilon \,}}_{i} \over {\widehat {\sigma }}{\sqrt {1-h_{ii}\ }}}}

wherehii is theleverage, andσ^{\displaystyle {\widehat {\sigma }}} is an appropriate estimate ofσ (see below).

In the case of a mean, this is equal to:

ti=ε^iσ^(n1)/n{\displaystyle t_{i}={{\widehat {\varepsilon \,}}_{i} \over {\widehat {\sigma }}{\sqrt {(n-1)/n}}}}

Internal and external studentization

[edit]

The usual estimate ofσ2 is theinternally studentized residual

σ^2=1nmj=1nε^j2.{\displaystyle {\widehat {\sigma }}^{2}={1 \over n-m}\sum _{j=1}^{n}{\widehat {\varepsilon \,}}_{j}^{\,2}.}

wherem is the number of parameters in the model (2 in our example).

But if thei th case is suspected of being improbably large, then it would also not be normally distributed. Hence it is prudent to exclude thei th observation from the process of estimating the variance when one is considering whether thei th case may be an outlier, and instead use theexternally studentized residual, which is

σ^(i)2=1nm1j=1jinε^j2,{\displaystyle {\widehat {\sigma }}_{(i)}^{2}={1 \over n-m-1}\sum _{\begin{smallmatrix}j=1\\j\neq i\end{smallmatrix}}^{n}{\widehat {\varepsilon \,}}_{j}^{\,2},}

based on all the residualsexcept the suspecti th residual. Here is to emphasize thatε^j2(ji){\displaystyle {\widehat {\varepsilon \,}}_{j}^{\,2}(j\neq i)} for suspecti are computed withi th case excluded.

If the estimateσ2includes thei th case, then it is called theinternally studentized residual,ti{\displaystyle t_{i}} (also known as thestandardized residual[1]).If the estimateσ^(i)2{\displaystyle {\widehat {\sigma }}_{(i)}^{2}} is used instead,excluding thei th case, then it is called theexternally studentized,ti(i){\displaystyle t_{i(i)}}.

Distribution

[edit]
"Tau distribution" redirects here; not to be confused withTau coefficient.

If the errors are independent andnormally distributed withexpected value 0 and varianceσ2, then theprobability distribution of theith externally studentized residualti(i){\displaystyle t_{i(i)}} is aStudent's t-distribution withn − m − 1degrees of freedom, and can range from{\displaystyle \scriptstyle -\infty } to+{\displaystyle \scriptstyle +\infty }.

On the other hand, the internally studentized residuals are in the range0±ν{\displaystyle 0\,\pm \,{\sqrt {\nu }}}, whereν =n − m is the number of residual degrees of freedom. Ifti represents the internally studentized residual, and again assuming that the errors are independent identically distributed Gaussian variables, then:[2]

tiνtt2+ν1{\displaystyle t_{i}\sim {\sqrt {\nu }}{t \over {\sqrt {t^{2}+\nu -1}}}}

wheret is a random variable distributed asStudent's t-distribution withν − 1 degrees of freedom. In fact, this implies thatti2 /ν follows thebeta distributionB(1/2,(ν − 1)/2).The distribution above is sometimes referred to as thetau distribution;[2] it was first derived by Thompson in 1935.[3]

Whenν = 3, the internally studentized residuals areuniformly distributed between3{\displaystyle \scriptstyle -{\sqrt {3}}} and+3{\displaystyle \scriptstyle +{\sqrt {3}}}.If there is only one residual degree of freedom, the above formula for the distribution of internally studentized residuals doesn't apply. In this case, theti are all either +1 or −1, with 50% chance for each.

The standard deviation of the distribution of internally studentized residuals is always 1, but this does not imply that the standard deviation of all theti of a particular experiment is 1.For instance, the internally studentized residuals when fitting a straight line going through (0, 0) to the points (1, 4), (2, −1), (2, −1) are2, 5/5, 5/5{\displaystyle {\sqrt {2}},\ -{\sqrt {5}}/5,\ -{\sqrt {5}}/5}, and the standard deviation of these is not 1.

Note that any pair of studentized residualti andtj (whereij{\displaystyle i\neq j}), are NOT i.i.d. They have the same distribution, but are not independent due to constraints on the residuals having to sum to 0 and to have them be orthogonal to the design matrix.

Software implementations

[edit]

Many programs and statistics packages, such asR,Python, etc., include implementations of Studentized residual.

Language/ProgramFunctionNotes
Rrstandard(model, ...)internally studentized. See[2]
Rrstudent(model, ...)externally studentized. See[3]


See also

[edit]

References

[edit]
  1. ^Regression Deletion Diagnostics R docs
  2. ^abAllen J. Pope (1976), "The statistics of residuals and the detection of outliers", U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Survey, Geodetic Research and Development Laboratory, 136 pages,[1], eq.(6)
  3. ^Thompson, William R. (1935)."On a Criterion for the Rejection of Observations and the Distribution of the Ratio of Deviation to Sample Standard Deviation".The Annals of Mathematical Statistics.6 (4):214–219.doi:10.1214/aoms/1177732567.

Further reading

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Studentized_residual&oldid=1308426673"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp