Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix apparent copy-paste bug in log_softmax reduced-precision fp kernel#156379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
swolchok wants to merge2 commits intogh/swolchok/781/basefromgh/swolchok/781/head

Conversation

@swolchok
Copy link
Contributor

@swolchokswolchok commentedJun 18, 2025
edited by pytorch-botbot
Loading

Stack fromghstack (oldest at bottom):

This looks like a bug. Check if trying to fix it breaks existing tests; if not, will look into why no test coverage caught it

cc@jgong5@mingfeima@XiaobingSuper@sanchitintel@ashokei@jingxu10@jerryzh168

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-botbot commentedJun 18, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/156379

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commitd1f4041 with merge base6303cc4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-botpytorch-botbot added the module: cpuCPU specific problem (e.g., perf, algorithm) labelJun 18, 2025
swolchok added a commit that referenced this pull requestJun 18, 2025
This looks like a bug. Check if trying to fix it breaks existing tests; if not, will look into why no test coverage caught itghstack-source-id:c35a9efPull-Request:#156379
@swolchokswolchok added the topic: bug fixestopic category labelJun 19, 2025
@cyyever
Copy link
Collaborator

Why the kineto changes?

@swolchok
Copy link
ContributorAuthor

Why the kineto changes?

whoops, think I just forgot to update submodules when I pulled main

@swolchok
Copy link
ContributorAuthor

swolchok commentedJun 20, 2025
edited
Loading

I've been stressing out trying to figure out why I can't detect this bug in testing. Apparently, this is because the actual value of the intermediate max used doesn't matter for "normal" values; it's a numerical accuracy thing called "safe softmax" and it cancels out (see e.g. "Why Safe Softmax Doesn't Change the Result"here). I imagine we could try to contrive a very specific case where it matters, but probably we should just fix it and move on.

[ghstack-poisoned]
swolchok added a commit that referenced this pull requestJun 20, 2025
This looks like a bug. Check if trying to fix it breaks existing tests; if not, will look into why no test coverage caught itghstack-source-id:beb8e09Pull-Request:#156379
max_fvec0 =fVec::blendv(max_fvec0, data_fvec0, data_fvec0 > max_fvec0);
max_fvec1 =fVec::blendv(max_fvec1, data_fvec1, data_fvec1 > max_fvec1);
max_fvec0.store(input_max_data + d1);
max_fvec0.store(input_max_data + d1 +fVec::size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OMG I EVEN SAW THIS BEFORE AND GOT confused why we were storing twice into max_fvec0? But hmmm why do we store twice into min_fvec and zero_fvec in lines 1028-1031 above?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I spent some time last night trying to understand how this kernel works (by interrogating the LLM of my choice about it since I am below average at intuitively understanding array indexing code without lots of pictures for some reason), so I actually can answer that! Lines 1028-1031 are storing the identities for the max and sum reductions into our array of accumulators.

We're making a non-contiguous reduction vectorizable anyway by doing anarray of reductions all at once. (We can note that outer_size is basically just a batch dimension and ignore it for purposes of understanding the kernel.) We slice up the inner dimensions into chunks of length CHUNK_SIZE a; the inner loops are doing CHUNK_SIZE reductions at once. Accordingly, they have an array of CHUNK_SIZE accumulators, which is what's getting initialized in lines 1028-1031. since the CHUNK_SIZE dimension is contiguous, we can vectorize along it and get "parallelization" that way through vector arithmetic. The blocking/chunking stuff is so that the dim_size x CHUNK_SIZE "vertical panel" that we hand to each thread fits in cache, since softmax will end up reading the data 3 times (for each inner loop -- max, sum + log, and data - logsum - max).

@swolchokswolchok added the release notes: cpu (x86)release notes category for x86, intel, etc. labelJun 20, 2025
@swolchokswolchok requested a review fromjaneyx99June 20, 2025 16:41
Copy link
Contributor

@janeyx99janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the explanation, though I haven't yet absorbed it all

@janeyx99
Copy link
Contributor

@pytorchbot merge

pytorch-bot[bot] reacted with thumbs up emoji

@pytorch-botpytorch-botbot added the ciflow/trunkTrigger trunk jobs on your pull request labelJun 20, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actionsgithub-actionsbot deleted the gh/swolchok/781/head branchJuly 21, 2025 02:20
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@janeyx99janeyx99janeyx99 approved these changes

Assignees

No one assigned

Labels

ciflow/trunkTrigger trunk jobs on your pull requestMergedmodule: cpuCPU specific problem (e.g., perf, algorithm)release notes: cpu (x86)release notes category for x86, intel, etc.topic: bug fixestopic category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@swolchok@cyyever@janeyx99@pytorchmergebot

[8]ページ先頭

©2009-2025 Movatter.jp