CUDA Stream Sanitizer #

Created On: Sep 09, 2022 | Last Updated On: Oct 31, 2022

Note

This is a prototype feature, which means it is at an early stagefor feedback and testing, and its components are subject to change.

Overview#

This module introduces CUDA Sanitizer, a tool for detecting synchronization errors between kernels ran on different streams.

It stores information on accesses to tensors to determine if they are synchronizedor not. When enabled in a python program and a possible data race is detected, adetailed warning will be printed and the program will exit.

It can be enabled either by importing this module and callingenable_cuda_sanitizer() or by exporting theTORCH_CUDA_SANITIZERenvironment variable.

Usage#

Here is an example of a simple synchronization error in PyTorch:

importtorcha=torch.rand(4,2,device="cuda")withtorch.cuda.stream(torch.cuda.Stream()):torch.mul(a,5,out=a)

Thea tensor is initialized on the default stream and, without any synchronizationmethods, modified on a new stream. The two kernels will run concurrently on the same tensor,which might cause the second kernel to read uninitialized data before the first one was ableto write it, or the first kernel might overwrite part of the result of the second.When this script is run on the commandline with:

TORCH_CUDA_SANITIZER=1pythonexample_error.py

the following output is printed by CSAN:

============================CSAN detected a possible data race on tensor with data pointer 139719969079296Access by stream 94646435460352 during kernel:aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)writing to argument(s) self, out, and to the outputWith stack trace:  File "example_error.py", line 6, in <module>    torch.mul(a, 5, out=a)  ...  File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch    stack_trace = traceback.StackSummary.extract(Previous access by stream 0 during kernel:aten::rand(int[] size, *, int? dtype=None, Device? device=None) -> Tensorwriting to the outputWith stack trace:  File "example_error.py", line 3, in <module>    a = torch.rand(10000, device="cuda")  ...  File "pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch    stack_trace = traceback.StackSummary.extract(Tensor was allocated with stack trace:  File "example_error.py", line 3, in <module>    a = torch.rand(10000, device="cuda")  ...  File "pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation    traceback.StackSummary.extract(

This gives extensive insight into the origin of the error:

A tensor was incorrectly accessed from streams with ids: 0 (default stream) and 94646435460352 (new stream)
The tensor was allocated by invokinga=torch.rand(10000,device="cuda")
The faulty accesses were caused by operators
- a=torch.rand(10000,device="cuda") on stream 0
- torch.mul(a,5,out=a) on stream 94646435460352
The error message also displays the schemas of the invoked operators, along with a noteshowing which arguments of the operators correspond to the affected tensor.
- In the example, it can be seen that tensora corresponds to argumentsself,outand theoutput value of the invoked operatortorch.mul.

API Reference#

torch.cuda._sanitizer.enable_cuda_sanitizer()[source]#

Enable CUDA Sanitizer.

The sanitizer will begin to analyze low-level CUDA calls invoked by torch functionsfor synchronization errors. All data races found will be printed to the standarderror output along with stack traces of suspected causes. For best results, thesanitizer should be enabled at the very beginning of the program.

On this page

Edit on GitHub

Show Source

PyTorch Libraries

Movatterモバイル変換

CUDA Stream Sanitizer #

Overview#

Usage#

API Reference#

Docs

Tutorials

Resources

Movatterモバイル変換

CUDA Stream Sanitizer#

Overview#

Usage#

API Reference#

Docs

Tutorials

Resources

CUDA Stream Sanitizer #