- Notifications
You must be signed in to change notification settings - Fork108
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al.https://arxiv.org/abs/1701.06538
License
davidmrau/mixture-of-experts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains the PyTorch re-implementation of the sparsely-gated MoE layer described in the paperOutrageously Large Neural Networks for PyTorch.
frommoeimportMoEimporttorch# instantiate the MoE layermodel=MoE(input_size=1000,output_size=20,num_experts=10,hidden_size=66,k=4,noisy_gating=True)X=torch.rand(32,1000)#trainmodel.train()# forwardy_hat,aux_loss=model(X)# evaluationmodel.eval()y_hat,aux_loss=model(X)
To install the requirements run:
pip install -r requirements.py
The fileexample.py
contains a minimal working example illustrating how to train and evaluate the MoE layer with dummy inputs and targets. To run the example:
python example.py
The filecifar10_example.py
contains a minimal working example of the CIFAR 10 dataset. It achieves an accuracy of 39% with arbitrary hyper-parameters and not fully converged. To run the example:
python cifar10_example.py
FastMoE: A Fast Mixture-of-Expert Training System This implementation was used as a reference PyTorch implementation for single-GPU training.
The code is based on the TensorFlow implementation that can be foundhere.
@misc{rau2019moe, title={Sparsely-gated Mixture-of-Experts PyTorch implementation}, author={Rau, David}, journal={https://github.com/davidmrau/mixture-of-experts}, year={2019}}
About
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al.https://arxiv.org/abs/1701.06538