Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op#7632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Merged
Changes from1 commit
Commits
Show all changes
21 commits
Select commitHold shift + click to select a range
7f7bac4
add cute_dsl nvfp4 linear.
limin2021Sep 8, 2025
80c0a11
update
limin2021Sep 8, 2025
d5e3bbe
test passed
limin2021Sep 9, 2025
dd01693
remove pf.py
limin2021Sep 9, 2025
4dd97f7
update.
limin2021Sep 9, 2025
260fd1d
format.
limin2021Sep 9, 2025
cceecdb
format.
limin2021Sep 9, 2025
392bb9e
remove useless code.
limin2021Sep 9, 2025
2dfc5b4
fix import error on ci.
limin2021Sep 9, 2025
4e7ad71
fix ci error to add python>=3.12
limin2021Sep 9, 2025
bc21b75
minor
limin2021Sep 9, 2025
5330fe0
comments.
limin2021Sep 9, 2025
7676535
copyright and cuda import
limin2021Sep 10, 2025
f0b0eb3
move cute_dsl op and kernels to individual dir.
limin2021Sep 10, 2025
2d72f21
recover tests.
limin2021Sep 15, 2025
1b9d336
Merge branch 'main' into add-cute-dsl-nvfp4-linear-step-1
limin2021Sep 15, 2025
19b154e
reorganize accord to jin's comment.
limin2021Sep 15, 2025
eae62df
refine to tao and yuxian's comments.
limin2021Sep 15, 2025
a945d74
fix
limin2021Sep 15, 2025
5007426
fix.
limin2021Sep 15, 2025
fd1850d
fix
limin2021Sep 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
fix
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
  • Loading branch information
@limin2021
limin2021 committedSep 15, 2025
commita945d7438b0d0213f6c8a46e7a0c55b99271a6bf
8 changes: 6 additions & 2 deletionstensorrt_llm/_torch/custom_ops/torch_custom_ops.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -1179,6 +1179,10 @@ def forward(
sf_k = pad_up(real_k // sf_vec_size, 4)
sf_n = pad_up(n, 128)

# the scaling tensor is 1D. we need to make sure it has been padded to the correct shape
assert a_sf_tensor.shape == (sf_m * sf_k)
assert b_sf_tensor.shape == (sf_n * sf_k)

a_ptr = self.make_cute_dsl_global_pointer(a_tensor,
cutlass.Float4E2M1FN, 32)
b_ptr = self.make_cute_dsl_global_pointer(b_tensor,
Expand DownExpand Up@@ -1264,10 +1268,10 @@ def cute_dsl_nvfp4_gemm_blackwell(
"trtllm::cute_dsl_nvfp4_gemm_blackwell",
[cute_dsl_nvfp4_gemm_blackwell_runner],
CuteDSLNVFP4BlackwellLinear.tuning_config,
[input, weight, input_scale, weight_scale, alpha, output_dtype],
[input, weight, input_scale, weight_scale],
)
return cute_dsl_nvfp4_gemm_blackwell_runner(
inputs=[input, weight, input_scale, weight_scale, alpha, output_dtype],
inputs=[input, weight, input_scale, weight_scale],
tactic=best_tactic,
)

Expand Down

[8]ページ先頭

©2009-2025 Movatter.jp