- Notifications
You must be signed in to change notification settings - Fork1.5k
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
License
NVIDIA/apex
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch.Some of the code here will be included in upstream Pytorch eventually.The intent of Apex is to make up-to-date utilities available to users as quickly as possible.
Eachapex.contrib
module requires one or more install options other than--cpp_ext
and--cuda_ext
.Note that contrib modules do not necessarily support stable PyTorch releases, some of them might only be compatible with nightlies.
NVIDIA PyTorch Containers are available on NGC:https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch.The containers come with all the custom extensions available at the moment.
Seethe NGC documentation for details such as:
- how to pull a container
- how to run a pulled container
- release notes
To install Apex from source, we recommend using the nightly Pytorch obtainable fromhttps://github.com/pytorch/pytorch.
The latest stable release obtainable fromhttps://pytorch.org should also work.
We recommend installingNinja
to make compilation faster.
For performance and full functionality, we recommend installing Apex withCUDA and C++ extensions via
git clone https://github.com/NVIDIA/apexcd apex# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings"--build-option=--cpp_ext" --config-settings"--build-option=--cuda_ext" ./# otherwisepip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
To reduce the build time of APEX, parallel building can be enhanced via
NVCC_APPEND_FLAGS="--threads 4" pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings"--build-option=--cpp_ext --cuda_ext --parallel 8" ./
When CPU cores or memory are limited, the--parallel
option is generally preferred over--threads
. Seepull#1882 for more details.
APEX also supports a Python-only build via
pip install -v --disable-pip-version-check --no-build-isolation --no-cache-dir ./
A Python-only build omits:
- Fused kernels required to use
apex.optimizers.FusedAdam
. - Fused kernels required to use
apex.normalization.FusedLayerNorm
andapex.normalization.FusedRMSNorm
. - Fused kernels that improve the performance and numerical stability of
apex.parallel.SyncBatchNorm
. - Fused kernels that improve the performance of
apex.parallel.DistributedDataParallel
andapex.amp
.DistributedDataParallel
,amp
, andSyncBatchNorm
will still be usable, but they may be slower.
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" .
may work if you were able to build Pytorch from sourceon your system. A Python-only build viapip install -v --no-cache-dir .
is more likely to work.
If you installed Pytorch in a Conda environment, make sure to install Apex in that same environment.
If a requirement of a module is not met, then it will not be built.
Module Name | Install Option | Misc |
---|---|---|
apex_C | --cpp_ext | |
amp_C | --cuda_ext | |
syncbn | --cuda_ext | |
fused_layer_norm_cuda | --cuda_ext | apex.normalization |
mlp_cuda | --cuda_ext | |
scaled_upper_triang_masked_softmax_cuda | --cuda_ext | |
generic_scaled_masked_softmax_cuda | --cuda_ext | |
scaled_masked_softmax_cuda | --cuda_ext | |
fused_weight_gradient_mlp_cuda | --cuda_ext | Requires CUDA>=11 |
permutation_search_cuda | --permutation_search | apex.contrib.sparsity |
bnp | --bnp | apex.contrib.groupbn |
xentropy | --xentropy | apex.contrib.xentropy |
focal_loss_cuda | --focal_loss | apex.contrib.focal_loss |
fused_index_mul_2d | --index_mul_2d | apex.contrib.index_mul_2d |
fused_adam_cuda | --deprecated_fused_adam | apex.contrib.optimizers |
fused_lamb_cuda | --deprecated_fused_lamb | apex.contrib.optimizers |
fast_layer_norm | --fast_layer_norm | apex.contrib.layer_norm . different fromfused_layer_norm |
fmhalib | --fmha | apex.contrib.fmha |
fast_multihead_attn | --fast_multihead_attn | apex.contrib.multihead_attn |
transducer_joint_cuda | --transducer | apex.contrib.transducer |
transducer_loss_cuda | --transducer | apex.contrib.transducer |
cudnn_gbn_lib | --cudnn_gbn | Requires cuDNN>=8.5,apex.contrib.cudnn_gbn |
peer_memory_cuda | --peer_memory | apex.contrib.peer_memory |
nccl_p2p_cuda | --nccl_p2p | Requires NCCL >= 2.10,apex.contrib.nccl_p2p |
fast_bottleneck | --fast_bottleneck | Requirespeer_memory_cuda andnccl_p2p_cuda ,apex.contrib.bottleneck |
fused_conv_bias_relu | --fused_conv_bias_relu | Requires cuDNN>=8.4,apex.contrib.conv_bias_relu |
About
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Resources
License
Uh oh!
There was an error while loading.Please reload this page.