Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Talk:Levenberg–Marquardt algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This article is ratedC-class on Wikipedia'scontent assessment scale.
It is of interest to the followingWikiProjects:
WikiProject iconMathematicsLow‑priority
WikiProject iconThis article is within the scope ofWikiProject Mathematics, a collaborative effort to improve the coverage ofmathematics on Wikipedia. If you would like to participate, please visit the project page, where you can jointhe discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
LowThis article has been rated asLow-priority on theproject's priority scale.
WikiProject iconStatisticsLow‑importance
WikiProject iconThis article is within the scope ofWikiProject Statistics, a collaborative effort to improve the coverage ofstatistics on Wikipedia. If you would like to participate, please visit the project page, where you can jointhe discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
LowThis article has been rated asLow-importance on theimportance scale.

L-M is for nonlinear least squares problems only

[edit]

The article needs to make clear that the L-M algorithm is for solving nonlinear least squares problems, not the more general class of nonlinear optimization problems. In fact, the introduction used to give the impression that the L-M algorithm is used to solve general nonlinear optimization problems, which is not true.

Functions vs. Values

[edit]

The article talks about "the functionsf(xi,β+δ){\displaystyle f(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }})}". This confused the heck out of me, until I realized that what is meant is the values of the functionf(x,β+δ){\displaystyle f(x,{\boldsymbol {\beta }}+{\boldsymbol {\delta }})} atxi{\displaystyle x_{i}}. Similar confused terminology is used at other points in the article, for example when callingJi{\displaystyle J_{i}} the gradient. These kind of terminology issues make the article hard to read.— Precedingunsigned comment added by80.65.109.181 (talk)12:09, 13 September 2013 (UTC)[reply]

Inconsistency

[edit]

In the article is says that Levenberg's contribution is in introducing lambda. But at the same time in the references it says that he was the first to publish the algorithm in 1944 or something. This seems a bit contradictory to me.Ravn-hawk (talk)15:38, 28 September 2011 (UTC)[reply]

Possible replacement article?

[edit]

Could I please recommend that people interested in this article look at Sam Roweis' document on Levenberg Marquardt at <http://www.cs.toronto.edu/~roweis/notes/lm.pdf>. It seems to be a very complete discription. If it were simply pasted in (pdf2html?) it would answer the questions below and add detail. This would have to be checked with the author, of course, which I will be happy to do. I am new to this, so my etiquette is probably short of the mark, but I hope this is useful.

I suggest that if you think there is valuable insights in Roweis' article you might use the information there to improve the existing wikipedia article. The questions below are not fundamentally important, but the article can be improved. Certainly no case for wholesale replacement.Billlion09:49, 5 September 2006 (UTC)[reply]

correction

[edit]

Should

(JTJ + λ)q = -JTf

be

(JTJ + λ I)q = -JTf. ?

The precedingunsigned comment was added by129.13.70.88 (talk • contribs) on 08:18, 29 August 2005.

Both formulas have the same meaning, namelyJTJq + λq = −JTf. As far as I am concerned, it is a matter of taste which you prefer, but the second is perhaps clearer. --Jitse Niesen (talk)20:46, 30 August 2005 (UTC)[reply]

All formulas have a lack of negative sign. For example:

(JTJ)δ=JT[yf(β)]{\displaystyle \mathbf {(J^{T}J){\boldsymbol {\delta }}=J^{T}[y-f({\boldsymbol {\beta }})]} \!}

should be:

(JTJ)δ=JT[yf(β)]{\displaystyle \mathbf {(J^{T}J){\boldsymbol {\delta }}=-J^{T}[y-f({\boldsymbol {\beta }})]} \!}

finding this bug took quite some time. And it should be pointed out, how to get to this formula. --vogt31337 9:00 26 January 2012 (UTC)— Precedingunsigned comment added byVogt31337 (talkcontribs)

@ Vogt31337:i think there is no sign missing sinceJ is the jacobian off andf has a negativ sign inS. the goal is to minimizeS and therefore we want to go in direction of negativ jacobian ofS which turns out to be the negative jacobian of-f and thus-(-J) = J. -- Georg

@ Vogt31337: I agree with you. -- D— Precedingunsigned comment added by192.55.54.42 (talk)23:13, 28 June 2019 (UTC)[reply]

Transpose

[edit]

Wheref andp are introduced:

fT=(f1, ...,fm), andpT=(p1, ...,pn).

they are presented with superscript T, which I think indicates that they are transposed. Butf extends over 1..m, andp extends over 1..n, so I don't understand why both are transposed. Can someone please explain what I am missing here? Thanks. --CarlManaster22:49, 15 December 2005 (UTC)[reply]

I think the reasoning is that (f1, ...,fm) is arow vector. However, we wantf to be a column vector. So the transpose is used to convert from rows to columns. It is perhaps clearer to write
f=(f1, ...,fm)T, andp=(p1, ...,pn)T.
To be honest, many mathematical texts do not bother to distinguish between row and column vectors, hoping that the reader can deduce from the context which one is meant.
Perhaps you know this, and your question is: why do we want to use a column vector? Well,f has to be a column vector becausefTf further down the article should be ainner product, andp has to be a column vector because it's added toq, which is a column vector because otherwise the productJq does not make sense. --Jitse Niesen (talk)23:19, 15 December 2005 (UTC)[reply]

Improvement by Marquardt

[edit]

What's about the improvement Marquardt made to the algorithm? He replaced I by diag[H], i.e. the diagonal of the (approximated) Hessian, to incorporate some local curvature estimation. This makes the algorithm go further in directions of smaller gradient to get out of narrow valleys on the error surface.

last entry byG.k.13:46, 27 April 2006 (UTC)[reply]

I believe this is correct. It also has the important practical benefit ofmaking λ dimensionless, so λ=1 is (I think) a sensible general starting point.

A way to address this might be to add a section called (say) "Improved solution" after the "Choice of damping parameter" section, with the new formula. The original "solution" section and this new one should probably be better referenced as well. Thoughts? I may give it a shot soon.Baccyak4H (Yak!)18:29, 25 July 2007 (UTC)[reply]

I believe Marquardt's suggested improvement was actually to scale the approximate hessian matrix,JTJ{\displaystyle J^{T}J}. The scaling had an effect similar to replacing the identity matrix with the diagonal of the hessian approximation. Marquardt suggests scaling the hessian approximation by an amount that makes the diagonal elements ones. Adding a constant diagonal matrix to the scaled matrix is similar to adding a proportion of the diagonal elements to the unscaled matrix. The scaling applied to the other elements of the hessian approximation improves the condition of the matrix. I suggest the following:

q=ΣJ[J^TJ^+λI]1J^T[yf(p)]{\displaystyle q=\Sigma _{J}[{\hat {J}}^{T}{\hat {J}}+\lambda {}I]^{-1}{\hat {J}}^{T}[y-f(p)]}

whereΣJ{\displaystyle \Sigma _{J}} is the square diagonal matrix:

ΣJ=[diag[JTJ]]12{\displaystyle \Sigma _{J}=[{\mbox{diag}}[J^{T}J]]^{-{\frac {1}{2}}}}

and the scaled Jacobian,J^{\displaystyle {\hat {J}}}, is:

J^=JΣJ{\displaystyle {\hat {J}}=J\Sigma _{J}}

The square matrix,J^TJ^{\displaystyle {\hat {J}}^{T}{\hat {J}}}, is then a scaled versionof the squared Jacobian,JTJ{\displaystyle J^{T}J}, where the scale factor isthe root mean square of the columns ofJTJ{\displaystyle J^{T}J}. The resultof the scaling is ones on all the diagonal elements ofJ^TJ^{\displaystyle {\hat {J}}^{T}{\hat {J}}}.

Note that more recent implementations that use the Levenberg-Marquardt tag do not include Marquardt's suggestion. It seems that clever choices ofλ{\displaystyle \lambda } result in reasonable robustness and less function evaluations.Stephen.scott.webb (talk)22:29, 4 February 2008 (UTC)[reply]

Reference to nonlinear programming

[edit]

Reference to nonlinear programming seems rather less appropriate herethan reference tononlinear regression. However, the article on the latter inadequate (but developing).Dfarrar09:08, 10 March 2007 (UTC)[reply]

explain terminology better

[edit]

A short explanation (or a link to a page with an explanation) about the terminology f(t_i|p) would be in order here. I dont think it is common usage outside of statistical groups.—Precedingunsigned comment added by130.225.102.1 (talk)15:26, 25 September 2007 (UTC)[reply]

A major proposal

[edit]

Please seetalk:least squares#A major proposal for details.Petergans (talk)17:39, 22 January 2008 (UTC)[reply]

Choice of damping parameter

[edit]

- should discuss the approach of Moré, the standard used in most production-quality numerical libraries. It tends to be both faster and more robust than earlier more naive methods.Jheald (talk)22:26, 30 January 2008 (UTC)[reply]


Explanation of the poor performance of L-M in example

[edit]

The data sequence has very high frequency components. The sparse sampling plan shown in the plots will necessarily contain huge aliasing error that is beyond any inverse algorithm. The way to correct this apparent poor performance of L-M is to sample the data train much more densely, e.g. in 0.001 step size, then the L-M will find the correct answer easily under a very wide range of initial conditions. It is really not L-M that is at fault here. --Xzhou.lumi (talk)22:32, 2 July 2008 (UTC)[reply]

Room for clarification

[edit]

I think the article could do with some clarification; I know I could. Most optimization algorithms I have looked at focus on searching in Rn for the minimum of some scalar-valued function of Rn. In this article, that function is clearlyS(β){\displaystyle S({\boldsymbol {\beta }})}, but the article doesn't emphasize that as much as it emphasizesf(xi,β){\displaystyle f(x_{i},{\boldsymbol {\beta }})}.

As I try to fully understand L–M, I keep seeing it as atrust region algorithm in which we do a second-order Taylor expansion,S(β+δ){\displaystyle S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})} in the neighborhood ofδ=0{\displaystyle {\boldsymbol {\delta }}=\mathbf {0} }. In that case we would have

S(β+δ)S(β)+grad(S)δ+δTHessian(S)δ{\displaystyle S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})\approx S({\boldsymbol {\beta }})+\operatorname {grad} (S)\cdot {\boldsymbol {\delta }}+{\boldsymbol {\delta }}^{\mathrm {T} }\operatorname {Hessian} (S){\boldsymbol {\delta }}}

It looks like L–M usesJTJ{\displaystyle \mathbf {J} ^{\mathrm {T} }\mathbf {J} } for the Hessian. Is this the exact Hessian or just an approximation? (Itlooks like an approximation, but I can't picture the difference between the true Hessian and the approximation.)

Now, the gradient ofS WRTβ is given in this article as

grad(S)=(2)(JT[yf(β)])T{\displaystyle \operatorname {grad} (S)=(-2)(\mathbf {J} ^{T}[y-f({\boldsymbol {\beta }})])^{T}}.

With some hand waving, I can see this giving us the equation for the minimum ofS of

(JTJ)δ=JT[yf(β)]{\displaystyle \mathbf {(J^{T}J){\boldsymbol {\delta }}=J^{T}[y-f({\boldsymbol {\beta }})]} \!}

Then it looks like the damping parameter basically defines the radius of the trust region and that it isn't a hard-edged trust region but rather theλI{\displaystyle \lambda I} term modifies the cost function to keep the local minimum from being too far away (basically adding a paraboloid, scaled by λ toS. Is that about right?

Finally, I don't quite see the intuition for usingdiag(DTD){\displaystyle \operatorname {diag} (\mathbf {D} ^{\mathrm {T} }\mathbf {D} )} instead of the identity.

Any answers/thoughts/corrections would be appreciated. Thanks.—Ben FrantzDale (talk)22:04, 7 December 2008 (UTC)[reply]


In between the section Solution there is the sentence:

>> Note that the gradient ofS with respect toδ equals2(JT[yf(β)])T{\displaystyle -2(\mathbf {J} ^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})])^{T}} <<
Is it not supposed to be: ... gradient ofS with respect toβ ?? -Georg— Precedingunsigned comment added by31.18.248.89 (talk)20:45, 27 May 2014 (UTC)[reply]

Notes and references

[edit]

There was a slight problem with the use of Notes and References. I have fixed it. --Михал Орела (talk)20:50, 3 February 2009 (UTC)[reply]

Comparison of Efficacy and Performance with other similar Algorithms

[edit]

Can anyone provide a section or information comparing the LMA with other algorithms such as the Conjugate-Gradient method?MxBuck (talk)19:18, 23 September 2009 (UTC)[reply]

Jacobian J

[edit]

I think there's a little mistake in the definition of J:

Shouldn't it be: J = S(β) / dβ ?

This would also correspond to this (from the external links):http://www.siam.org/books/textbooks/fr18_book.pdf

We're trying to minimize the error, so taking the Jacobian of the original function wouldn't make sense.— Precedingunsigned comment added by129.132.154.39 (talk)10:42, 20 October 2011 (UTC)[reply]

Covariance of the solution

[edit]

There should be discussion of the covariance of the solution. I think it's that if J is left-multiplied by the inverse of the input covariance, thanJΣ1J{\displaystyle J^{\top }\Sigma ^{-1}J} is the covariance of the output, or somesuch.—Ben FrantzDale (talk)17:20, 26 October 2011 (UTC)[reply]

External links modified

[edit]

Hello fellow Wikipedians,

I have just modified 2 external links onLevenberg–Marquardt algorithm. Please take a moment to reviewmy edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visitthis simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018.After February 2018, "External links modified" talk page sections are no longer generated or monitored byInternetArchiveBot. No special action is required regarding these talk page notices, other thanregular verification using the archive tool instructions below. Editorshave permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see theRfC before doing mass systematic removals. This message is updated dynamically through the template{{source check}}(last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them withthis tool.
  • If you found an error with any archives or the URLs themselves, you can fix them withthis tool.

Cheers.—InternetArchiveBot(Report bug)02:18, 22 December 2017 (UTC)[reply]

Broken images

[edit]

Hi just a note to say that the image links on this page appear to be broken. Thanks.— Precedingunsigned comment added by2A01:110:8012:1012:0:0:0:82 (talk)11:59, 17 January 2018 (UTC)[reply]

Retrieved from "https://en.wikipedia.org/w/index.php?title=Talk:Levenberg–Marquardt_algorithm&oldid=1322557067"
Categories:

[8]ページ先頭

©2009-2025 Movatter.jp