flash-attention

[CVPR 2025] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

memory-efficient clip contrastive-learning flash-attention ring-attention infinite-batch-size

UpdatedJan 16, 2025
Python

xlite-dev /ffpa-attn-mma

Star153

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

cuda attention sdpa mla mlsys tensor-cores flash-attention deepseek deepseek-v3 deepseek-r1 fused-mla flash-mla

UpdatedMar 23, 2025
Cuda

alexzhang13 /flashattention2-custom-mask

Star103

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

UpdatedAug 14, 2024
Python

CoinCheung /gdGPT

Star94

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

nlp bloom pipeline pytorch deepspeed llm full-finetune model-parallization flash-attention llama2 baichuan2-7b chatglm3-6b mixtral-8x7b

UpdatedFeb 5, 2024
Python

Bruce-Lee-LY /flash_attention_inference

Star35

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

UpdatedFeb 27, 2025
C++

Bruce-Lee-LY /decoding_attention

Star35

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

gpu cuda inference nvidia mha mla multi-head-attention gqa mqa llm large-language-model flash-attention cuda-core decoding-attention flashinfer flashmla

UpdatedMar 9, 2025
C++

kklemon /FlashPerceiver

Star26

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

nlp deep-learning transformer attention-mechanism perceiver flash-attention

UpdatedNov 4, 2024
Python

RulinShao /FastCkpt

Star24

Python package for rematerialization-aware gradient checkpointing

gradient-checkpointing flash-attention

UpdatedOct 31, 2023
Python

erfanzar /jax-flash-attn2

Star23

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

UpdatedMar 4, 2025
Python

Naman-ntc /FastCode

Star21

Utilities for efficient fine-tuning, inference and evaluation of code generation models

transformers efficient inference code-generation finetuning flash-attention

UpdatedOct 3, 2023
Python

kyegomez /FlashMHA

Sponsor

Star20

An simple pytorch implementation of Flash MultiHead Attention

artificial-intelligence transformer attention artificial-neural-networks attention-mechanisms attentionisallyouneed gpt4 flash-attention

UpdatedFeb 5, 2024
Jupyter Notebook

AI-DarwinLabs /amd-mi300-ml-stack

Star10

🚀 Automated deployment stack for AMD MI300 GPUs with optimized ML/DL frameworks and HPC-ready configurations

machine-learning deep-learning hpc axolotl slurm conda gpu-computing rocm deepspeed pytorch-rocm flash-attention amd-mi300

UpdatedNov 30, 2024
Shell

Improve this page

Add a description, image, and links to theflash-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theflash-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention

Here are 33 public repositories matching this topic...

QwenLM /Qwen

ymcui /Chinese-LLaMA-Alpaca-2

InternLM /InternLM

xlite-dev /Awesome-LLM-Inference

xlite-dev /CUDA-Learn-Notes

flashinfer-ai /flashinfer

MoonshotAI /MoBA

InternLM /InternEvo

DAMO-NLP-SG /Inf-CLIP