- Notifications
You must be signed in to change notification settings - Fork1
[NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems
License
daochenzha/dreamshard
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the implementation for the paperDreamShard: Generalizable Embedding Table Placement for Recommender Systems. We propose DreamShard, a reinforcement learning approach for embedding table placement. DreamShard has two novel ideas. 1) It learns a cost network to directly predict the costs of the fused embedding operations. 2) It trains a policy network by interacting with an estimated Markov decision process (MDP) without real GPU execution. Please refer the paper for more deteails.
Miscellaneous Resources: Have you heard of data-centric AI? Please check out ourdata-centric AI survey andawesome data-centric AI resources!
If you find this project helpful, please cite
@inproceedings{zha2022dreamshard,title={DreamShard: Generalizable Embedding Table Placement for Recommender Systems},author={Zha, Daochen and Feng, Louis and Tan, Qiaoyu and Liu, Zirui and Lai, Kwei-Herng and Bhargav, Bhushanam and Tian, Yuandong and Kejariwal, Arun and Hu, Xia},booktitle={Advances in Neural Information Processing Systems},year={2022}}
Step 1: install PyTorch
pip3 install torch
Step 2: install FBGEMM
Follow the instructions inhttps://github.com/pytorch/FBGEMM to install the embedding operators
Step 3: install DreamShard
pip3 install -r requirements.txtpip3 install -e .
Step 1: Download DLRM dataset
Download the data withgit lfs
athttps://github.com/facebookresearch/dlrm_datasets
Step 2: Process the dataset
python3 tools/gen_dlrm_data.py
Note that you need to change--data
argument to the path of the downloaded DLRM dataset.
Step 3: Generate training and testing tasks
python3 tools/gen_tasks.py --T 50 --out-dir data/dlrm_tasks_50
The argument--T
specifies the number of tables, and--out-dir
indicates the output directory.
Step 4: Train DreamShard
python3 train.py --task-path data/dlrm_tasks_50/train.txt --gpu-devices 0,1,2,3 --max-memory 5 --out-dir models/dreamshard
Note that you need to specify--gpu-devices
and--max-memory
based on your GPU. You also need to specify--task-path
.--out-dir
indicates where the trained model will be saved.
Step 5: Evaluate DreamShard and baselines
python3 eval.py --alg models/dreamshard/9.pt --task-path data/dlrm_tasks_50/test.txt --gpu-devices 0,1,2,3 --max-memory 5
Not that you need to specify--gpu-devices
and--max-memory
based on your GPU. You also need to specify--task-path
.--alg
points to the saved model. Here9.pt
is the final saved model because we train 10 iterations and save the model after each iteration.
To obtain the results of the baselines, simply change--alg
torandom
,dim_greedy
,lookup_greedy
,size_greedy
, orsize_lookup_greedy
.