Computer Science > Machine Learning
arXiv:2302.01172 (cs)
[Submitted on 2 Feb 2023]
Title:STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
Authors:Yucheng Lu,Shivani Agrawal,Suvinay Subramanian,Oleg Rybakov,Christopher De Sa,Amir Yazdanbakhsh
View a PDF of the paper titled STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition, by Yucheng Lu and 5 other authors
View PDFAbstract:Recent innovations on hardware (e.g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference. However, state-of-the-art learning recipes in this regime (e.g. SR-STE) are proposed for non-adaptive optimizers like momentum SGD, while incurring non-trivial accuracy drop for Adam-trained models like attention-based LLMs. In this paper, we first demonstrate such gap origins from poorly estimated second moment (i.e. variance) in Adam states given by the masked weights. We conjecture that learning N:M masks with Adam should take the critical regime of variance estimation into account. In light of this, we propose STEP, an Adam-aware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (precondition phase) and subsequently, the variance remains fixed and is used as a precondition to learn N:M masks (mask-learning phase). STEP automatically identifies the switching point of two phases by dynamically sampling variance changes over the training trajectory and testing the sample concentration. Empirically, we evaluate STEP and other baselines such as ASP and SR-STE on multiple tasks including CIFAR classification, machine translation and LLM fine-tuning (BERT-Base, GPT-2). We show STEP mitigates the accuracy drop of baseline recipes and is robust to aggressive structured sparsity ratios.
Subjects: | Machine Learning (cs.LG) |
Cite as: | arXiv:2302.01172 [cs.LG] |
(orarXiv:2302.01172v1 [cs.LG] for this version) | |
https://doi.org/10.48550/arXiv.2302.01172 arXiv-issued DOI via DataCite |
Full-text links:
Access Paper:
- View PDF
- TeX Source
- Other Formats
View a PDF of the paper titled STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition, by Yucheng Lu and 5 other authors
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
Litmaps(What is Litmaps?)
scite Smart Citations(What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv(What is alphaXiv?)
CatalyzeX Code Finder for Papers(What is CatalyzeX?)
DagsHub(What is DagsHub?)
Gotit.pub(What is GotitPub?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)
ScienceCast(What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)
IArxiv Recommender(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.