Form of Newton's method used in statistics
Scoring algorithm , also known asFisher's scoring ,[ 1] is a form ofNewton's method used instatistics to solvemaximum likelihood equationsnumerically , named afterRonald Fisher .
Sketch of derivation [ edit ] LetY 1 , … , Y n {\displaystyle Y_{1},\ldots ,Y_{n}} berandom variables , independent and identically distributed with twice differentiablep.d.f. f ( y ; θ ) {\displaystyle f(y;\theta )} , and we wish to calculate themaximum likelihood estimator (M.L.E.)θ ∗ {\displaystyle \theta ^{*}} ofθ {\displaystyle \theta } . First, suppose we have a starting point for our algorithmθ 0 {\displaystyle \theta _{0}} , and consider aTaylor expansion of thescore function ,V ( θ ) {\displaystyle V(\theta )} , aboutθ 0 {\displaystyle \theta _{0}} :
V ( θ ) ≈ V ( θ 0 ) − J ( θ 0 ) ( θ − θ 0 ) , {\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,} where
J ( θ 0 ) = − ∑ i = 1 n ∇ ∇ ⊤ | θ = θ 0 log f ( Y i ; θ ) {\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )} is theobserved information matrix atθ 0 {\displaystyle \theta _{0}} . Now, settingθ = θ ∗ {\displaystyle \theta =\theta ^{*}} , using thatV ( θ ∗ ) = 0 {\displaystyle V(\theta ^{*})=0} and rearranging gives us:
θ ∗ ≈ θ 0 + J − 1 ( θ 0 ) V ( θ 0 ) . {\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,} We therefore use the algorithm
θ m + 1 = θ m + J − 1 ( θ m ) V ( θ m ) , {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,} and under certain regularity conditions, it can be shown thatθ m → θ ∗ {\displaystyle \theta _{m}\rightarrow \theta ^{*}} .
In practice,J ( θ ) {\displaystyle {\mathcal {J}}(\theta )} is usually replaced byI ( θ ) = E [ J ( θ ) ] {\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]} , theFisher information , thus giving us theFisher Scoring Algorithm :
θ m + 1 = θ m + I − 1 ( θ m ) V ( θ m ) {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})} ..Under some regularity conditions, ifθ m {\displaystyle \theta _{m}} is aconsistent estimator , thenθ m + 1 {\displaystyle \theta _{m+1}} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[ 2]
^ Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects".Biometrika .74 (4):817– 827.doi :10.1093/biomet/74.4.817 . ^ Li, Bing; Babu, G. Jogesh (2019),"Bayesian Inference" ,Springer Texts in Statistics , New York, NY: Springer New York, Theorem 9.4,doi :10.1007/978-1-4939-9761-9_6 ,ISBN 978-1-4939-9759-6 ,S2CID 239322258 , retrieved2023-01-03 {{citation }}: CS1 maint: work parameter with ISBN (link )
Optimization computes maxima and minima.