Automatic mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput.

Mixed precision training can be enabled with passing the--apex flag to the training script, for example:

python3 train.py --gpus 0-3 --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml --apex

How mixed precision works

Mixed precision is the use of both float16 and float32 data types when training a model.

Performing arithmetic operations in float16 takes advantage of the performance gains of using specialized processing units such as the Tensor cores on NVIDIA GPUs. Due to the smaller representable range of float16, performing the entire training with float16 data type can result in underflow of the gradients, leading to convergence or model quality issues.

However, performing only select arithmetic operations in float16 results in performance gains when using compatible hardware accelerators, decreasing training time and reducing memory usage, typically without sacrificing model performance.

To learn more about mixed precision and how it works:

Overview of Automatic Mixed Precision for Deep Learning
NVIDIA Mixed Precision Training Documentation
NVIDIA Deep Learning Performance Guide

adding mixed precision training, which improves performance by about …

b0f5260

…40% on GPUs with tensor cores

Copy link

Author

vinhngx commentedJul 18, 2019

Please, could any maintainer of this repo helps review this?

Copy link

Author

vinhngx commentedJul 18, 2019

@hangzhaomit?

vinhngx changed the title~~adding mixed precision training, which improves throughput by about …~~Adding mixed precision training support

Jul 23, 2019

vinhngx added2 commits

August 8, 2019 11:21

Merge branch 'master' into master

a840b01

fix conflicts

08d17ad

Copy link

ghost commentedDec 5, 2019

@vinhngx This seems interesting. How much gain in training time were you able to achieve by using mixed precision training?
Also, do we have to make any changes to the test.py also while running inferences?

Copy link

seedlit commentedDec 6, 2019

@vinhngx Have you also prepared a script with mixed precision support for running inferences?

Copy link

Author

vinhngx commentedDec 13, 2019

@vinhngx This seems interesting. How much gain in training time were you able to achieve by using mixed precision training?
Also, do we have to make any changes to the test.py also while running inferences?

@ghost though the training is done with FP16, the weight is stored as FP32 and there is no changed required when doing inference (in FP32)

Copy link

Author

vinhngx commentedDec 13, 2019•
edited
Loading

@vinhngx Have you also prepared a script with mixed precision support for running inferences?

@seedlit I have not produced a script for mixed precision inference here, though for serious productization, converting the model to ONNX then accelerated withTensorRT is recommended.

Copy link

seedlit commentedDec 18, 2019•
edited
Loading

@vinhngx I am using this script for mixed precision training (on multiple gpus). However, it dosen't work well with O2, and O3. See#227. For O1, my gpus memory usage increases compared to FP32 training, and training time also increases. It seems we have towait a little till mixed precision training support is introduced in Pytorch 1.5

I am also getting 'gradient overflow' when training on single gpu on O1 (when setting batch_size_per_gpu = 1, for batch_size_per_gpu >1, it works fine), even after

with amp.scale_loss(loss1, optimizer1) as scaled_loss:    scaled_loss.backward()

The issue I am getting is
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0

Do you know any workaround this?

Copy link

JulianJuaner commentedMar 13, 2020•
edited
Loading

I am working on acceleration using mixed-precision training. Actually, it works fine when using a single GPU with several lines modification (about 8x faster than using 4 cards with the same batch size, half memory used). But this does not work for multiple GPUs. I think it is majorly caused by the new DataParallel and BatchNormSync implementation.
Or is it possible to modify the code to have a better performance on multiple GPUs?

After referencing hszhao's DDP implementation onthis link. Now my program has half GPU memory usage with 3x faster speed during the training on multiple devices.

Labels

None yet

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding mixed precision training support#179

Are you sure you want to change the base?

Adding mixed precision training support#179

Uh oh!

Conversation

vinhngx commentedJun 19, 2019•
edited
Loading

Uh oh!

Uh oh!

vinhngx commentedJul 18, 2019

Uh oh!

vinhngx commentedJul 18, 2019

Uh oh!

ghost commentedDec 5, 2019

Uh oh!

seedlit commentedDec 6, 2019

Uh oh!

vinhngx commentedDec 13, 2019

Uh oh!

vinhngx commentedDec 13, 2019•
edited
Loading

Uh oh!

Uh oh!

seedlit commentedDec 18, 2019•
edited
Loading

Uh oh!

Uh oh!

JulianJuaner commentedMar 13, 2020•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

Adding mixed precision training support#179

Are you sure you want to change the base?

Adding mixed precision training support#179

Uh oh!

Conversation

vinhngx commentedJun 19, 2019• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

vinhngx commentedJul 18, 2019

Uh oh!

vinhngx commentedJul 18, 2019

Uh oh!

ghost commentedDec 5, 2019

Uh oh!

seedlit commentedDec 6, 2019

Uh oh!

vinhngx commentedDec 13, 2019

Uh oh!

vinhngx commentedDec 13, 2019• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

seedlit commentedDec 18, 2019• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

JulianJuaner commentedMar 13, 2020• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vinhngx commentedJun 19, 2019•
edited
Loading

vinhngx commentedDec 13, 2019•
edited
Loading

seedlit commentedDec 18, 2019•
edited
Loading

JulianJuaner commentedMar 13, 2020•
edited
Loading