Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
@Bruce-Lee-LY
Bruce-Lee-LY
Follow
View Bruce-Lee-LY's full-sized avatar

Bruce-Lee-LY Bruce-Lee-LY

Block or report Bruce-Lee-LY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more aboutblocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more aboutreporting abuse.

Report abuse

PinnedLoading

  1. decoding_attentiondecoding_attentionPublic

    Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

    C++ 35 2

  2. flash_attention_inferenceflash_attention_inferencePublic

    Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

    C++ 35 3

  3. cuda_hgemmcuda_hgemmPublic

    Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

    Cuda 374 76

  4. cuda_hookcuda_hookPublic

    Hooked CUDA-related dynamic libraries by using automated code generation tools.

    C 150 41

  5. cuda_hgemvcuda_hgemvPublic

    Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

    Cuda 59 5

  6. cutlass_gemmcutlass_gemmPublic

    Multiple GEMM operators are constructed with cutlass to support LLM inference.

    C++ 17 2


[8]ページ先頭

©2009-2025 Movatter.jp