Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

License

NotificationsYou must be signed in to change notification settings

epfml/dynamic-sparse-flash-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code to reproduce results for the paper "Faster Causal Attention Over Large Sequences Through Sparse Flash Attention"

Setup

To install the required python dependencies, first run:

pip install -r ./requirements.txt

Then install Triton:

git clone https://github.com/openai/triton.gitcd triton git checkout b2a757d00028fe844a93904036a18e8670bfe92fcd python pip install cmakepip install -e.

In the command above we set the Triton library to the commit used in our experiments. Feel free to experiment with later Triton versions.

Reproducing our LM experiments on OpenWebText2

GPU requirements: Preferably, you need at least one A100. Some of our experiments use data-parallelism with up to 3 A100s. You should have no problem running those experiments on any GPU supportingbfloat16, you might have to change the model parameters to adapt to the memory available.

Go in theopenwebtext2-experiments folder and run thescript/train-LMs.sh command.

Reproducing our runtime results

GPU requirements: We used one A100.

For the Hash-sparse and QK-sparse results, go in theruntime-experiments folder and check thetimeperf-hash-and-qk-sparse.ipynb notebook.

Reproducing our Reformer results

Coming soon

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp