- Notifications
You must be signed in to change notification settings - Fork251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
New PR for Nesterov change#47
Conversation
coveralls commentedJan 21, 2015
description for future reference: This PR modified the implementation of Nesterov solver so that the formulas are more consistent with the cited paper (although the old implementation is in an equivalent form). |
Out of curiosity, could you show me how the previous implementation was equivalent? I couldn't make it work out when I tried. The code didn't use the last_momentum variable which seemed to be necessary. |
@the-moliver Here is my derivation. Sorry I did not use the standard notation in the paper (h means history). Correct me if I'm wrong: |
Yup, your derivation looks correct. I'm surprised they didn't use that form in the paper. The only difference with standard momentum then is that the parameters are updated with the history and gamma at time t, rather than t-1, which is kinda cool. |
I'll remove the old PR in favor of this one