PyTorch 2.4 windows performance regression compared with 0410 nightly #130619

New issue

Closed

PyTorch 2.4 windows performance regression compared with 0410 nightly#130619

Assignees

Labels

high prioritymodule: binariesAnything related to official binaries that we release to usersmodule: regressionIt used to work, and now it doesn'tmodule: windowsWindows support for PyTorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Milestone

2.4.1

Description

WeizhuoZhang-intel

opened

on Jul 12, 2024

🐛 Describe the bug

We are measuring performance on windows and observed that Windows whl performance regressed on rls/2.4 pre-released whl. In my local env, dev20240410 nightly whl was downloaded few month ago and we can see windows performance improved which might related to optimization PR by@xuhancn . We are get our best to search which nightly whl caused this regression, but the oldest nightly whl is 0513 which is already regressed.

Model	BS	Pytorch 2.1 THP	Pytorch 0314 THP	Pytorch 0410	Pytorch 2.4 THP
RN50	4	40.1	40.0962	41.401	13.7635
Mobilenetv3 Large	8	147.3972	139.6188	219.277	116.182
distilbert-base	8	6.641	5.603333	8.7825	3.249667
roberta-base	8	3.48425	2.572	4.201	1.6335

Hardware: 13th Gen Intel Core i7-13700H 2.4GHz
OS: Windows 11 23H2 22631.3593

Versions

How to reproduce:
https://github.com/WeizhuoZhang-intel/win_benchmarks/blob/main/torchvision_models.py

# torchvisionpython torchvision_models.py# transformerspip install datasets evaluate accelerate transformers==4.34.1 scipy scikit-learngit clone -b v4.34.1 --depth 1 https://github.com/huggingface/transformers.gitcd .\transformers\examples\pytorch\text-classification\python run_glue.py --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english --task_name sst2 --do_eval   --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_lastpython run_glue.py --model_name_or_path"deepset/roberta-base-squad2" --task_name sst2 --do_eval   --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_last

cc@ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch 2.4 windows performance regression compared with 0410 nightly #130619

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions