- Notifications
You must be signed in to change notification settings - Fork109
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al.https://arxiv.org/abs/1701.06538
License
davidmrau/mixture-of-experts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains the PyTorch re-implementation of the sparsely-gated MoE layer described in the paperOutrageously Large Neural Networks for PyTorch.
frommoeimportMoEimporttorch# instantiate the MoE layermodel=MoE(input_size=1000,output_size=20,num_experts=10,hidden_size=66,k=4,noisy_gating=True)X=torch.rand(32,1000)#trainmodel.train()# forwardy_hat,aux_loss=model(X)# evaluationmodel.eval()y_hat,aux_loss=model(X)
To install the requirements run:
pip install -r requirements.py
The fileexample.py
contains a minimal working example illustrating how to train and evaluate the MoE layer with dummy inputs and targets. To run the example:
python example.py
The filecifar10_example.py
contains a minimal working example of the CIFAR 10 dataset. It achieves an accuracy of 39% with arbitrary hyper-parameters and not fully converged. To run the example:
python cifar10_example.py
FastMoE: A Fast Mixture-of-Expert Training System This implementation was used as a reference PyTorch implementation for single-GPU training.
The code is based on the TensorFlow implementation that can be foundhere.
@misc{rau2019moe, title={Sparsely-gated Mixture-of-Experts PyTorch implementation}, author={Rau, David}, journal={https://github.com/davidmrau/mixture-of-experts}, year={2019}}
About
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al.https://arxiv.org/abs/1701.06538
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors4
Uh oh!
There was an error while loading.Please reload this page.