Movatterモバイル変換


[0]ホーム

URL:


NVIDIACUDA Toolkit Documentation
Search In:
< Previous |Next >
CUDA Runtime API (PDF) - v13.0.2 (older) - Last updated October 9, 2025 -Send Feedback

3. Stream synchronization behavior

Default stream

The default stream, used when0 is passed as acudaStream_t or by APIs that operate on a stream implicitly, can be configured to have eitherlegacy orper-thread synchronization behavior as described below.

The behavior can be controlled per compilation unit with the--default-stream nvcc option. Alternatively, per-thread behavior can be enabled by defining theCUDA_API_PER_THREAD_DEFAULT_STREAM macro before including any CUDA headers. Either way, theCUDA_API_PER_THREAD_DEFAULT_STREAM macro will be defined in compilation units using per-thread synchronization behavior.

Legacy default stream

The legacy default stream is an implicit stream which synchronizes with all other streams in the sameCUcontext except for non-blocking streams, described below. (For applications using the runtime APIs only, there will be one context per device.) When an action is taken in the legacy stream such as a kernel launch orcudaStreamWaitEvent(), the legacy stream first waits on all blocking streams, the action is queued in the legacy stream, and then all blocking streams wait on the legacy stream.

For example, the following code launches a kernelk_1 in streams, thenk_2 in the legacy stream, thenk_3 in streams:

k_1<<<1, 1, 0, s>>>();k_2<<<1, 1>>>();k_3<<<1, 1, 0, s>>>();

The resulting behavior is thatk_2 will block onk_1 andk_3 will block onk_2.

Non-blocking streams which do not synchronize with the legacy stream can be created using thecudaStreamNonBlocking flag with the stream creation APIs.

The legacy default stream can be used explicitly with theCUstream (cudaStream_t) handleCU_STREAM_LEGACY (cudaStreamLegacy).

Per-thread default stream

The per-thread default stream is an implicit stream local to both the thread and theCUcontext, and which does not synchronize with other streams (just like explicitly created streams). The per-thread default stream is not a non-blocking stream and will synchronize with the legacy default stream if both are used in a program.

The per-thread default stream can be used explicitly with theCUstream (cudaStream_t) handleCU_STREAM_PER_THREAD (cudaStreamPerThread).



[8]ページ先頭

©2009-2025 Movatter.jp