- Notifications
You must be signed in to change notification settings - Fork11.9k
Fix MHAEinsum weight dimension bug when d_in != d_out (#857)#893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered byReviewNB |
rasbt commentedOct 27, 2025
Thanks a lot for the PR and sorry about the late response, I was out of town last week. I'll have a look soon. |
rasbt commentedOct 27, 2025
The fix looks great, and thanks for adding those tests. I just moved over the tests to a separate python script for pytest similar to what I've done with some other notebooks here. This way, it's easier to test via the CI runners, and it keeps the code notebook more readable. |
rasbt left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Overall looks good to me, thanks for the PR!
.gitignore Outdated
| #Ignore vscode AI rules | ||
| .github/instructions/codacy.instructions.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I just saw this newly added entry, I assume this is for people using vibe code apps? Maybe this can be removed as it's not related to the PR.
Uh oh!
There was an error while loading.Please reload this page.
27d52d6 intorasbt:mainUh oh!
There was an error while loading.Please reload this page.

Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully.
Fixing the issue#857