daochenzha/dreamshardPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star29

[NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems

License

MIT license

29 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dreamshard		dreamshard
imgs		imgs
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Repository files navigation

DreamShard

This is the implementation for the paperDreamShard: Generalizable Embedding Table Placement for Recommender Systems. We propose DreamShard, a reinforcement learning approach for embedding table placement. DreamShard has two novel ideas. 1) It learns a cost network to directly predict the costs of the fused embedding operations. 2) It trains a policy network by interacting with an estimated Markov decision process (MDP) without real GPU execution. Please refer the paper for more deteails.

Miscellaneous Resources: Have you heard of data-centric AI? Please check out ourdata-centric AI survey andawesome data-centric AI resources!

Cite this Work

If you find this project helpful, please cite

@inproceedings{zha2022dreamshard,title={DreamShard: Generalizable Embedding Table Placement for Recommender Systems},author={Zha, Daochen and Feng, Louis and Tan, Qiaoyu and Liu, Zirui and Lai, Kwei-Herng and Bhargav, Bhushanam and Tian, Yuandong and Kejariwal, Arun and Hu, Xia},booktitle={Advances in Neural Information Processing Systems},year={2022}}

Installation

Step 1: install PyTorch

pip3 install torch

Step 2: install FBGEMM

Follow the instructions inhttps://github.com/pytorch/FBGEMM to install the embedding operators

Step 3: install DreamShard

pip3 install -r requirements.txtpip3 install -e .

Run on the DLRM Dataset

Step 1: Download DLRM dataset

Download the data withgit lfs athttps://github.com/facebookresearch/dlrm_datasets

Step 2: Process the dataset

python3 tools/gen_dlrm_data.py

Note that you need to change--data argument to the path of the downloaded DLRM dataset.

Step 3: Generate training and testing tasks

python3 tools/gen_tasks.py --T 50 --out-dir data/dlrm_tasks_50

The argument--T specifies the number of tables, and--out-dir indicates the output directory.

Step 4: Train DreamShard

python3 train.py --task-path data/dlrm_tasks_50/train.txt --gpu-devices 0,1,2,3 --max-memory 5 --out-dir models/dreamshard

Note that you need to specify--gpu-devices and--max-memory based on your GPU. You also need to specify--task-path.--out-dir indicates where the trained model will be saved.

Step 5: Evaluate DreamShard and baselines

python3 eval.py --alg models/dreamshard/9.pt --task-path data/dlrm_tasks_50/test.txt --gpu-devices 0,1,2,3 --max-memory 5

Not that you need to specify--gpu-devices and--max-memory based on your GPU. You also need to specify--task-path.--alg points to the saved model. Here9.pt is the final saved model because we train 10 iterations and save the model after each iteration.

To obtain the results of the baselines, simply change--alg torandom,dim_greedy,lookup_greedy,size_greedy, orsize_lookup_greedy.

About

[NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

DreamShard

Cite this Work

Installation

Run on the DLRM Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

daochenzha/dreamshard

Folders and files

Latest commit

History

Repository files navigation

DreamShard

Cite this Work

Installation

Run on the DLRM Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages