- Notifications
You must be signed in to change notification settings - Fork26.3k
Description
🐛 Describe the bug
We are measuring performance on windows and observed that Windows whl performance regressed on rls/2.4 pre-released whl. In my local env, dev20240410 nightly whl was downloaded few month ago and we can see windows performance improved which might related to optimization PR by@xuhancn . We are get our best to search which nightly whl caused this regression, but the oldest nightly whl is 0513 which is already regressed.
| Model | BS | Pytorch 2.1 THP | Pytorch 0314 THP | Pytorch 0410 | Pytorch 2.4 THP |
|---|---|---|---|---|---|
| RN50 | 4 | 40.1 | 40.0962 | 41.401 | 13.7635 |
| Mobilenetv3 Large | 8 | 147.3972 | 139.6188 | 219.277 | 116.182 |
| distilbert-base | 8 | 6.641 | 5.603333 | 8.7825 | 3.249667 |
| roberta-base | 8 | 3.48425 | 2.572 | 4.201 | 1.6335 |
Hardware: 13th Gen Intel Core i7-13700H 2.4GHz
OS: Windows 11 23H2 22631.3593
Versions
How to reproduce:
https://github.com/WeizhuoZhang-intel/win_benchmarks/blob/main/torchvision_models.py
# torchvisionpython torchvision_models.py# transformerspip install datasets evaluate accelerate transformers==4.34.1 scipy scikit-learngit clone -b v4.34.1 --depth 1 https://github.com/huggingface/transformers.gitcd .\transformers\examples\pytorch\text-classification\python run_glue.py --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english --task_name sst2 --do_eval --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_lastpython run_glue.py --model_name_or_path"deepset/roberta-base-squad2" --task_name sst2 --do_eval --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_last
cc@ezyang@gchanan@zou3519@kadeng@msaroufim@seemethere@malfet@osalpekar@atalman@peterjc123@mszhanyi@skyline75489@nbcsm@vladimir-aubrecht@iremyux@Blackhex@cristianPanaite