Rate this Page

torch.backends#

Created On: Sep 16, 2020 | Last Updated On: Aug 26, 2025

torch.backends controls the behavior of various backends that PyTorch supports.

These backends include:

  • torch.backends.cpu

  • torch.backends.cuda

  • torch.backends.cudnn

  • torch.backends.cusparselt

  • torch.backends.mha

  • torch.backends.mps

  • torch.backends.mkl

  • torch.backends.mkldnn

  • torch.backends.nnpack

  • torch.backends.openmp

  • torch.backends.opt_einsum

  • torch.backends.xeon

torch.backends.cpu#

torch.backends.cpu.get_cpu_capability()[source]#

Return cpu capability as a string value.

Possible values:- “DEFAULT”- “VSX”- “Z VECTOR”- “NO AVX”- “AVX2”- “AVX512”- “SVE256”

Return type

str

torch.backends.cuda#

torch.backends.cuda.is_built()[source]#

Return whether PyTorch is built with CUDA support.

Note that this doesn’t necessarily mean CUDA is available; just that if this PyTorchbinary were run on a machine with working CUDA drivers and devices, we would be able to use it.

torch.backends.cuda.matmul.allow_tf32#

Abool that controls whether TensorFloat-32 tensor cores may be used in matrixmultiplications on Ampere or newer GPUs. allow_tf32 is going to be deprecated. SeeTensorFloat-32 (TF32) on Ampere (and later) devices.

torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction#

Abool that controls whether reduced precision reductions (e.g., with fp16 accumulation type) are allowed with fp16 GEMMs.

torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction#

Abool that controls whether reduced precision reductions are allowed with bf16 GEMMs.

torch.backends.cuda.cufft_plan_cache#

cufft_plan_cache contains the cuFFT plan caches for each CUDA device.Query a specific devicei’s cache viatorch.backends.cuda.cufft_plan_cache[i].

torch.backends.cuda.cufft_plan_cache.size#

A readonlyint that shows the number of plans currently in a cuFFT plan cache.

torch.backends.cuda.cufft_plan_cache.max_size#

Aint that controls the capacity of a cuFFT plan cache.

torch.backends.cuda.cufft_plan_cache.clear()#

Clears a cuFFT plan cache.

torch.backends.cuda.preferred_blas_library(backend=None)[source]#

Override the library PyTorch uses for BLAS operations. Choose between cuBLAS, cuBLASLt, and CK [ROCm-only].

Warning

This flag is experimental and subject to change.

When PyTorch runs a CUDA BLAS operation it defaults to cuBLAS even if both cuBLAS and cuBLASLt are available.For PyTorch built for ROCm, hipBLAS, hipBLASLt, and CK may offer different performance.This flag (astr) allows overriding which BLAS library to use.

  • If“cublas” is set then cuBLAS will be used wherever possible.

  • If“cublaslt” is set then cuBLASLt will be used wherever possible.

  • If“ck” is set then CK will be used wherever possible.

  • If“default” (the default) is set then heuristics will be used to pick between the other options.

  • When no input is given, this function returns the currently preferred library.

  • User may use the environment variable TORCH_BLAS_PREFER_CUBLASLT=1 to set the preferred library to cuBLASLtglobally.This flag only sets the initial value of the preferred library and the preferred librarymay still be overridden by this function call later in your script.

Note: When a library is preferred other libraries may still be used if the preferred librarydoesn’t implement the operation(s) called.This flag may achieve better performance if PyTorch’s library selection is incorrectfor your application’s inputs.

Return type

_BlasBackend

torch.backends.cuda.preferred_rocm_fa_library(backend=None)[source]#

[ROCm-only]Override the backend PyTorch uses in ROCm environments for Flash Attention. Choose between AOTriton and CK

Warning

This flag is experimental and subject to change.

When Flash Attention is enabled and desired, PyTorch defaults to using AOTriton as the backend.This flag (astr) allows users to override this backend to use composable_kernel

  • If“default” is set then the default backend will be used wherever possible. Currently AOTriton.

  • If“aotriton” is set then AOTriton will be used wherever possible.

  • If“ck” is set then CK will be used wherever possible.

  • When no input is given, this function returns the currently preferred library.

  • User may use the environment variable TORCH_ROCM_FA_PREFER_CK=1 to set the preferred library to CKglobally.

Note: When a library is preferred other libraries may still be used if the preferred librarydoesn’t implement the operation(s) called.This flag may achieve better performance if PyTorch’s library selection is incorrectfor your application’s inputs.

Return type

_ROCmFABackend

torch.backends.cuda.preferred_linalg_library(backend=None)[source]#

Override the heuristic PyTorch uses to choose between cuSOLVER and MAGMA for CUDA linear algebra operations.

Warning

This flag is experimental and subject to change.

When PyTorch runs a CUDA linear algebra operation it often uses the cuSOLVER or MAGMA libraries,and if both are available it decides which to use with a heuristic.This flag (astr) allows overriding those heuristics.

  • If“cusolver” is set then cuSOLVER will be used wherever possible.

  • If“magma” is set then MAGMA will be used wherever possible.

  • If“default” (the default) is set then heuristics will be used to pick betweencuSOLVER and MAGMA if both are available.

  • When no input is given, this function returns the currently preferred library.

  • User may use the environment variable TORCH_LINALG_PREFER_CUSOLVER=1 to set the preferred library to cuSOLVERglobally.This flag only sets the initial value of the preferred library and the preferred librarymay still be overridden by this function call later in your script.

Note: When a library is preferred other libraries may still be used if the preferred librarydoesn’t implement the operation(s) called.This flag may achieve better performance if PyTorch’s heuristic library selection is incorrectfor your application’s inputs.

Currently supported linalg operators:

Return type

_LinalgBackend

classtorch.backends.cuda.SDPAParams#
torch.backends.cuda.flash_sdp_enabled()[source]#

Warning

This flag is beta and subject to change.

Returns whether flash scaled dot product attention is enabled or not.

torch.backends.cuda.enable_mem_efficient_sdp(enabled)[source]#

Warning

This flag is beta and subject to change.

Enables or disables memory efficient scaled dot product attention.

torch.backends.cuda.mem_efficient_sdp_enabled()[source]#

Warning

This flag is beta and subject to change.

Returns whether memory efficient scaled dot product attention is enabled or not.

torch.backends.cuda.enable_flash_sdp(enabled)[source]#

Warning

This flag is beta and subject to change.

Enables or disables flash scaled dot product attention.

torch.backends.cuda.math_sdp_enabled()[source]#

Warning

This flag is beta and subject to change.

Returns whether math scaled dot product attention is enabled or not.

torch.backends.cuda.enable_math_sdp(enabled)[source]#

Warning

This flag is beta and subject to change.

Enables or disables math scaled dot product attention.

torch.backends.cuda.fp16_bf16_reduction_math_sdp_allowed()[source]#

Warning

This flag is beta and subject to change.

Returns whether fp16/bf16 reduction in math scaled dot product attention is enabled or not.

torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(enabled)[source]#

Warning

This flag is beta and subject to change.

Enables or disables fp16/bf16 reduction in math scaled dot product attention.

torch.backends.cuda.cudnn_sdp_enabled()[source]#

Warning

This flag is beta and subject to change.

Returns whether cuDNN scaled dot product attention is enabled or not.

torch.backends.cuda.enable_cudnn_sdp(enabled)[source]#

Warning

This flag is beta and subject to change.

Enables or disables cuDNN scaled dot product attention.

torch.backends.cuda.is_flash_attention_available()[source]#

Check if PyTorch was built with FlashAttention for scaled_dot_product_attention.

Returns

True if FlashAttention is built and available; otherwise, False.

Return type

bool

Note

This function is dependent on a CUDA-enabled build of PyTorch. It will return Falsein non-CUDA environments.

torch.backends.cuda.can_use_flash_attention(params,debug=False)[source]#

Check if FlashAttention can be utilized in scaled_dot_product_attention.

Parameters
  • params (_SDPAParams) – An instance of SDPAParams containing the tensors for query,key, value, an optional attention mask, dropout rate, anda flag indicating if the attention is causal.

  • debug (bool) – Whether to logging.warn debug information as to why FlashAttention could not be run.Defaults to False.

Returns

True if FlashAttention can be used with the given parameters; otherwise, False.

Return type

bool

Note

This function is dependent on a CUDA-enabled build of PyTorch. It will return Falsein non-CUDA environments.

torch.backends.cuda.can_use_efficient_attention(params,debug=False)[source]#

Check if efficient_attention can be utilized in scaled_dot_product_attention.

Parameters
  • params (_SDPAParams) – An instance of SDPAParams containing the tensors for query,key, value, an optional attention mask, dropout rate, anda flag indicating if the attention is causal.

  • debug (bool) – Whether to logging.warn with information as to why efficient_attention could not be run.Defaults to False.

Returns

True if efficient_attention can be used with the given parameters; otherwise, False.

Return type

bool

Note

This function is dependent on a CUDA-enabled build of PyTorch. It will return Falsein non-CUDA environments.

torch.backends.cuda.can_use_cudnn_attention(params,debug=False)[source]#

Check if cudnn_attention can be utilized in scaled_dot_product_attention.

Parameters
  • params (_SDPAParams) – An instance of SDPAParams containing the tensors for query,key, value, an optional attention mask, dropout rate, anda flag indicating if the attention is causal.

  • debug (bool) – Whether to logging.warn with information as to why cuDNN attention could not be run.Defaults to False.

Returns

True if cuDNN can be used with the given parameters; otherwise, False.

Return type

bool

Note

This function is dependent on a CUDA-enabled build of PyTorch. It will return Falsein non-CUDA environments.

torch.backends.cuda.sdp_kernel(enable_flash=True,enable_math=True,enable_mem_efficient=True,enable_cudnn=True)[source]#

Warning

This flag is beta and subject to change.

This context manager can be used to temporarily enable or disable any of the three backends for scaled dot product attention.Upon exiting the context manager, the previous state of the flags will be restored.

torch.backends.cudnn#

torch.backends.cudnn.version()[source]#

Return the version of cuDNN.

torch.backends.cudnn.is_available()[source]#

Return a bool indicating if CUDNN is currently available.

torch.backends.cudnn.enabled#

Abool that controls whether cuDNN is enabled.

torch.backends.cudnn.allow_tf32#

Abool that controls where TensorFloat-32 tensor cores may be used in cuDNNconvolutions on Ampere or newer GPUs. allow_tf32 is going to be deprecated. SeeTensorFloat-32 (TF32) on Ampere (and later) devices.

torch.backends.cudnn.deterministic#

Abool that, if True, causes cuDNN to only use deterministic convolution algorithms.See alsotorch.are_deterministic_algorithms_enabled() andtorch.use_deterministic_algorithms().

torch.backends.cudnn.benchmark#

Abool that, if True, causes cuDNN to benchmark multiple convolution algorithmsand select the fastest.

torch.backends.cudnn.benchmark_limit#

Aint that specifies the maximum number of cuDNN convolution algorithms to try whentorch.backends.cudnn.benchmark is True. Setbenchmark_limit to zero to try everyavailable algorithm. Note that this setting only affects convolutions dispatched via thecuDNN v8 API.

torch.backends.cusparselt#

torch.backends.cusparselt.version()[source]#

Return the version of cuSPARSELt

Return type

Optional[int]

torch.backends.cusparselt.is_available()[source]#

Return a bool indicating if cuSPARSELt is currently available.

Return type

bool

torch.backends.mha#

torch.backends.mha.get_fastpath_enabled()[source]#

Returns whether fast path for TransformerEncoder and MultiHeadAttentionis enabled, orTrue if jit is scripting.

Note

The fastpath might not be run even ifget_fastpath_enabled returnsTrue unless all conditions on inputs are met.

Return type

bool

torch.backends.mha.set_fastpath_enabled(value)[source]#

Sets whether fast path is enabled

torch.backends.miopen#

torch.backends.miopen.immediate#

Abool that, if True, causes MIOpen to use Immediate Mode(https://rocm.docs.amd.com/projects/MIOpen/en/latest/how-to/find-and-immediate.html).

torch.backends.mps#

torch.backends.mps.is_available()[source]#

Return a bool indicating if MPS is currently available.

Return type

bool

torch.backends.mps.is_built()[source]#

Return whether PyTorch is built with MPS support.

Note that this doesn’t necessarily mean MPS is available; just thatif this PyTorch binary were run a machine with working MPS driversand devices, we would be able to use it.

Return type

bool

torch.backends.mkl#

torch.backends.mkl.is_available()[source]#

Return whether PyTorch is built with MKL support.

classtorch.backends.mkl.verbose(enable)[source]#

On-demand oneMKL verbosing functionality.

To make it easier to debug performance issues, oneMKL can dump verbosemessages containing execution information like duration while executingthe kernel. The verbosing functionality can be invoked via an environmentvariable namedMKL_VERBOSE. However, this methodology dumps messages inall steps. Those are a large amount of verbose messages. Moreover, forinvestigating the performance issues, generally taking verbose messagesfor one single iteration is enough. This on-demand verbosing functionalitymakes it possible to control scope for verbose message dumping. In thefollowing example, verbose messages will be dumped out for the secondinference only.

importtorchmodel(data)withtorch.backends.mkl.verbose(torch.backends.mkl.VERBOSE_ON):model(data)
Parameters

level – Verbose level-VERBOSE_OFF: Disable verbosing-VERBOSE_ON: Enable verbosing

torch.backends.mkldnn#

torch.backends.mkldnn.is_available()[source]#
classtorch.backends.mkldnn.verbose(level)[source]#

On-demand oneDNN (former MKL-DNN) verbosing functionality.

To make it easier to debug performance issues, oneDNN can dump verbosemessages containing information like kernel size, input data size andexecution duration while executing the kernel. The verbosing functionalitycan be invoked via an environment variable namedDNNL_VERBOSE. However,this methodology dumps messages in all steps. Those are a large amount ofverbose messages. Moreover, for investigating the performance issues,generally taking verbose messages for one single iteration is enough.This on-demand verbosing functionality makes it possible to control scopefor verbose message dumping. In the following example, verbose messageswill be dumped out for the second inference only.

importtorchmodel(data)withtorch.backends.mkldnn.verbose(torch.backends.mkldnn.VERBOSE_ON):model(data)
Parameters

level – Verbose level-VERBOSE_OFF: Disable verbosing-VERBOSE_ON: Enable verbosing-VERBOSE_ON_CREATION: Enable verbosing, including oneDNN kernel creation

torch.backends.nnpack#

torch.backends.nnpack.is_available()[source]#

Return whether PyTorch is built with NNPACK support.

torch.backends.nnpack.flags(enabled=False)[source]#

Context manager for setting if nnpack is enabled globally

torch.backends.nnpack.set_flags(_enabled)[source]#

Set if nnpack is enabled globally

torch.backends.openmp#

torch.backends.openmp.is_available()[source]#

Return whether PyTorch is built with OpenMP support.

torch.backends.opt_einsum#

torch.backends.opt_einsum.is_available()[source]#

Return a bool indicating if opt_einsum is currently available.

You must install opt-einsum in order for torch to automatically optimize einsum. Tomake opt-einsum available, you can install it along with torch:pipinstalltorch[opt-einsum]or by itself:pipinstallopt-einsum. If the package is installed, torch will importit automatically and use it accordingly. Use this function to check whether opt-einsumwas installed and properly imported by torch.

Return type

bool

torch.backends.opt_einsum.get_opt_einsum()[source]#

Return the opt_einsum package if opt_einsum is currently available, else None.

Return type

Any

torch.backends.opt_einsum.enabled#

Abool that controls whether opt_einsum is enabled (True by default). If so,torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html)if available to calculate an optimal path of contraction for faster performance.

If opt_einsum is not available, torch.einsum will fall back to the default contraction pathof left to right.

torch.backends.opt_einsum.strategy#

Astr that specifies which strategies to try whentorch.backends.opt_einsum.enabledisTrue. By default, torch.einsum will try the “auto” strategy, but the “greedy” and “optimal”strategies are also supported. Note that the “optimal” strategy is factorial on the number ofinputs as it tries all possible paths. See more details in opt_einsum’s docs(https://optimized-einsum.readthedocs.io/en/stable/path_finding.html).

torch.backends.xeon#

On this page