NotificationsYou must be signed in to change notification settings
Fork70
Star363

TMA pointwise scheduler tests#5565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

liqiangxl wants to merge25 commits intomain

base:main

Choose a base branch

fromllu/pt5_test

Draft

TMA pointwise scheduler tests#5565

liqiangxl wants to merge25 commits intomainfromllu/pt5_test

Conversation

Copy link

Collaborator

liqiangxl commentedNov 20, 2025

enable and run ci tests

liqiangxland others added20 commits

November 18, 2025 07:51

Move common code shared by TMA and non-TMA version to pointwise utils

bfb8add

add auto tma scheduler

6d36bb1

revise to pre-hopper

4ec29b0

comment

0cf29cc

add bcast

83a809e

add fallback

0177976

merge conditions 1 and 2

3862e82

clean

8b83a8a

Merge branch 'main' into llu/pt3_auto1

a5e7287

Update csrc/scheduler/pointwise_tma.cpp

20f59c1

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

wip

55c01fc

revise

d3c236f

Merge branch 'main' into llu/pt3_auto1

d56df3c

revise

e2e124c

Merge branch 'llu/pt3_auto1' into llu/pt3_auto2_bcast

0b5bd62

don't inline non-tma loaded tvs

48c99d8

check cacheable uses

1f6063b

enable tma

46757a2

check contig

db2e66a

tests

e7e012c

Copy link

CollaboratorAuthor

liqiangxl commentedNov 20, 2025

!test

Copy link

github-actionsbot commentedNov 20, 2025•
edited by xwang233
Loading

Review updated until commita21ff37

Description

Implement TMA (Tensor Memory Accelerator) scheduler for pointwise operations
Add automatic TMA detection and fallback to non-TMA scheduler
Refactor common scheduling utilities into pointwise_utils
Add comprehensive TMA tests covering broadcast, vectorization scenarios

Changes walkthrough

Relevant files

Enhancement

9 files

pointwise.cpp `Add TMA auto-detection and scheduler integration`	+62/-4
pointwise_non_tma.cpp `Refactor to use common scheduling utilities`	+32/-131
pointwise_tma.cpp `Implement complete TMA scheduler with heuristics`	+385/-4
pointwise_utils.cpp `Add common break point and block/grid utilities`	+181/-0
utils.cpp `Add TMA utility functions and cacheable uses detection`	+105/-46
pointwise_heuristic.h `Add TMA-specific parameters to PointwiseParams`	+7/-0
pointwise_tma.h `Update TMA scheduler function signatures`	+2/-1
pointwise_utils.h `Add break point and block/grid configuration structures`	+38/-0
utils.h `Add TMA utility function declarations`	+31/-6

Tests

4 files

test_gpu2.cpp `Update vectorization test for TMA compatibility`	+9/-1
test_pointwise.cpp `Add comprehensive TMA scheduler tests`	+432/-62
test_resize.cpp `Update resize test for TMA/non-TMA handling`	+9/-7
test_vectorization.cpp `Update vectorization tests for TMA support`	+3/-3

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Debug Output There are debug print statements (std::cout) at lines 294-295 that should be removed or properly guarded with debug flags before merging to main. std::cout <<"reference_tv:" << reference_tv->toString() << std::endl;reference_tv->printTransforms(); Test Naming Line 463 has a typo in the test name: "VIssue1567ectorizationFactorAnalysisCase3" should be "Issue1567VectorizationFactorAnalysisCase3" at::Tensor t1 = at::randn({512,1024,2}, options);// NOTE force pointwise scheduler here just for testing purpose Commented Code Lines 400-402 contain commented-out code that should be cleaned up or removed before merging. // bool use_tma = mayUseTma(prop, runtime_info) &&// isOptionEnabled(EnableOption::TmaPointwise); for CI testing, use tma always// if possible

Test failures

(Medium, 90)NVFuser internal assert (Unknown tensor map data type) in test_direct_ops opinfo suite

Test Name	GB200	H100
tests.python.direct.test_repro.test_issue1277	❌	❌
tests.python.opinfo.test_direct_ops.test_correctness_abs_complex128		❌
tests.python.opinfo.test_direct_ops.test_correctness_abs_complex64	❌	❌
tests.python.opinfo.test_direct_ops.test_correctness_acos_complex128		❌
tests.python.opinfo.test_direct_ops.test_correctness_acos_complex64	❌	❌
tests.python.opinfo.test_direct_ops.test_correctness_acosh_complex128	❌	❌
tests.python.opinfo.test_direct_ops.test_correctness_acosh_complex64	❌	❌
tests.python.opinfo.test_direct_ops.test_correctness_add_complex128		❌
tests.python.opinfo.test_direct_ops.test_correctness_add_complex64		❌
tests.python.opinfo.test_direct_ops.test_correctness_asin_complex128		❌
... with 57 more test failures omitted. Check internal logs.

(Medium, 46)NVFuser internal assert: Unknown tensor map data type on complex dtype ops (opinfo direct & UnaryTests)

Test Name	GB200	H100	Source
UnaryTests/UnaryTest.Neg/std__complex_float_	❌	❌	Link
tests.python.opinfo.test_direct_ops.test_correctness_abs_complex128	❌
tests.python.opinfo.test_direct_ops.test_correctness_acos_complex128	❌
tests.python.opinfo.test_direct_ops.test_correctness_add_complex128	❌
tests.python.opinfo.test_direct_ops.test_correctness_add_complex64	❌
tests.python.opinfo.test_direct_ops.test_correctness_asin_complex128	❌
tests.python.opinfo.test_direct_ops.test_correctness_asin_complex64	❌
tests.python.opinfo.test_direct_ops.test_correctness_asinh_complex128	❌
tests.python.opinfo.test_direct_ops.test_correctness_asinh_complex64	❌
tests.python.opinfo.test_direct_ops.test_correctness_atan_complex128	❌
... with 35 more test failures omitted. Check internal logs.

(Medium, 12)NVFuser internal assertion failures in BlockQuantizationSchedulingTestSuite and MatmulSchedulerTest

Test Name	GB200	Source
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/__bfloat_1024x1024_WithGlobalScale_NoSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/__bfloat_128x64_NoGlobalScale_WithSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/__bfloat_2048x128_NoGlobalScale_NoSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/__bfloat_2048x128_WithGlobalScale_WithSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/__bfloat_2048x2048_WithGlobalScale_NoSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/float_1024x1024_NoGlobalScale_NoSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/float_1024x1024_WithGlobalScale_WithSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/float_128x64_WithGlobalScale_NoSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/float_2048x128_NoGlobalScale_WithSwizzle	❌	Link
BlockQuantizationSchedulingTestSuite/BlockQuantizationSchedulingTest.AutoScheduleSingleOp/float_2048x2048_NoGlobalScale_NoSwizzle	❌	Link
... with 2 more test failures omitted. Check internal logs.

(Medium, 9)Multiple NVFuser internal assertion failures across grouped_mm, multidevice matmul/transformer, and thunderfx MoE tests

Test Name	GB200	GB200 (dist.)	H100	H100 (dist.)
tests.python.direct.test_with_id_model_indexer.test_layout_op_and_cutlass_nvfp4_grouped_mm[out_dtype=torch.bfloat16-tokens_per_expert_neg_one=[115, 144, 8]-config=[1024, 128, 256]]	❌
tests.python.multidevice.test_matmul.test_linear_reduce_scatter		❌		❌
tests.python.multidevice.test_matmul.test_sequence_parallel_linear		❌		❌
tests.python.multidevice.test_transformer.test_grouped_mlp		❌		❌
tests.python.test_moe.test_llama4_moe_thunderfx	❌		❌

(Medium, 8)NVFuser TMA analysis internal asserts (merge-discontiguous / extent divisibility) in PointwiseTest, ResizeTest, matmul_stride, and issue1953 suites

Test Name	GB200	H100	Source
PointwiseTest.VIssue1567ectorizationFactorAnalysisCase3	❌	❌	Link
ResizeTest.PadAndCacheUses	❌	❌	Link
tests.python.direct.test_matmul.test_matmul_stride	❌	❌
tests.python.direct.test_repro.test_issue1953	❌	❌

(Medium, 2)nvFuser internal input-size assert in test_schedule_ops::TestScheduleOps.test_concretize_reshape_pointwise
Test Name GB200 H100 Source
tests.python.test_schedule_ops.TestScheduleOps.test_concretize_reshape_pointwise ❌ ❌
(Medium, 2)nvFuser split-after-parallelization assertion in multidevice transformer tests
Test Name GB200 H100 Source
tests.python.multidevice.test_transformer.test_grouped_mlp ❌ ❌
(Medium, 2)Heuristic string mismatch in test_tutorial_compute_heuristics_and_schedule
Test Name GB200 H100 Source
tests.python.direct.test_tutorial.test_tutorial_compute_heuristics_and_schedule ❌ ❌
(Medium, 1)nvFuser pointwise heuristic unroll factor mismatch in PointwiseTest
Test Name GB200 Source
PointwiseTest.Heuristicst1Compute2Unroll4 ❌ Link

liqiangxl added2 commits

November 20, 2025 12:26

fix

af36bc0

use bit

5d20038

Copy link

CollaboratorAuthor

liqiangxl commentedNov 20, 2025

!test

fix

6d2f424

Copy link

CollaboratorAuthor

liqiangxl commentedNov 21, 2025

!test

check one by one

857c5b8

Copy link

CollaboratorAuthor

liqiangxl commentedNov 21, 2025

!test

consider breakpoint

a21ff37

Copy link

CollaboratorAuthor

liqiangxl commentedNov 21, 2025

!test

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TMA pointwise scheduler tests#5565

Are you sure you want to change the base?

TMA pointwise scheduler tests#5565

Uh oh!

Conversation

liqiangxl commentedNov 20, 2025

Uh oh!

liqiangxl commentedNov 20, 2025

Uh oh!

github-actionsbot commentedNov 20, 2025•
edited by xwang233
Loading

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

liqiangxl commentedNov 20, 2025

Uh oh!

liqiangxl commentedNov 21, 2025

Uh oh!

liqiangxl commentedNov 21, 2025

Uh oh!

liqiangxl commentedNov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Movatterモバイル変換

TMA pointwise scheduler tests#5565

Are you sure you want to change the base?

TMA pointwise scheduler tests#5565

Uh oh!

Conversation

liqiangxl commentedNov 20, 2025

Uh oh!

liqiangxl commentedNov 20, 2025

Uh oh!

github-actionsbot commentedNov 20, 2025• edited by xwang233Loading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

liqiangxl commentedNov 20, 2025

Uh oh!

liqiangxl commentedNov 21, 2025

Uh oh!

liqiangxl commentedNov 21, 2025

Uh oh!

liqiangxl commentedNov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actionsbot commentedNov 20, 2025•
edited by xwang233
Loading