Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Download Microsoft EdgeMore info about Internet Explorer and Microsoft Edge
Table of contentsExit editor mode

HLSL Shader Model 6.0

Feedback

In this article

Describes the wave operation intrinsics added to HLSL Shader Model 6.0.

Shader Model 6.0

For earlier shader models, HLSL programming exposes only a single thread of execution. New wave-level operations are provided, starting with model 6.0, to explicitly take advantage of the parallelism of current GPUs - many threads can be executing in lockstep on the same core simultaneously. For example, the model 6.0 intrinsics enable the elimination of barrier constructs when the scope of synchronization is within the width of the SIMD processor, or some other set of threads that are known to be atomic relative to each other.

Potential use cases include: stream compaction, reductions, block transpose, bitonic sort or Fast Fourier Transforms (FFT), binning, stream de-duplication, and similar scenarios.

Most of the intrinsics appear in pixel shaders and compute shaders, though there are some exceptions (noted for each function). The functions have been added to the requirements for DirectX Feature Level 12.0, under API level 12.

The<type> parameter and return value for these functions implies the type of the expression, the supported types are those from the following list that arealso present in the target shader model for your app:

  • half, half2, half3, half4
  • float, float2, float3, float4
  • double, double2, double3, double4
  • int, int2, int3, int4
  • uint, uint2, uint3, uint4
  • short, short2, short3, short4
  • ushort, ushort2, ushort3, ushort4
  • uint64_t, uint64_t2, uint64_t3, uint64_t4

Some operations (such as the bitwise operators) only support the integer types.

Terminology

TermDefinition
LaneA single thread of execution. The shader models before version 6.0 expose only one of these at the language level, leaving expansion to parallel SIMD processing entirely up to the implementation.
WaveA set of lanes (threads) executed simultaneously in the processor. No explicit barriers are required to guarantee that they execute in parallel. Similar concepts include "warp" and "wavefront."
Inactive LaneA lane which is not being executed, for example due to the flow of control, or insufficient work to fill the minimum size of the wave.
Active LaneA lane for which execution is being performed. In pixel shaders, it may include any helper pixel lanes.
QuadA set of 4 adjacent lanes corresponding to pixels arranged in a 2x2 square. They are used to estimate gradients by differencing in either x or y. A wave may be comprised of multiple quads. All pixels in an active quad are executed (and may be "Active Lanes"), but those that do not produce visible results are termed "Helper Lanes".
Helper LaneA lane which is executed solely for the purpose of gradients in pixel shader quads. The output of such a lane will be discarded, and so not render to the destination surface.

Shading language intrinsics

All the operations of this shader model have been added in a range of intrinsic functions.

Wave Query

The intrinsics for querying a single wave.

IntrinsicDescriptionPixel shaderCompute shader
WaveGetLaneCountReturns the number of lanes in the current wave.**
WaveGetLaneIndexReturns the index of the current lane within the current wave.**
WaveIsFirstLaneReturns true only for the active lane in the current wave with the smallest index**

Wave Vote

This set of intrinsics compare values across threads currently active from the current wave.

IntrinsicDescriptionPixel shaderCompute shader
WaveActiveAnyTrueReturns true if the expression is true in any active lane in the current wave.**
WaveActiveAllTrueReturns true if the expression is true in all active lanes in the current wave.**
WaveActiveBallotReturns a 64-bit unsigned integer bitmask of the evaluation of the Boolean expression for all active lanes in the specified wave.**

Wave Broadcast

These intrinsics enable all active lanes in the current wave to receive the value from the specified lane, effectively broadcasting it. The return value from an invalid lane is undefined.

IntrinsicDescriptionPixel shaderCompute shader
WaveReadLaneAtReturns the value of the expression for the given lane index within the specified wave.**
WaveReadLaneFirstReturns the value of the expression for the active lane of the current wave with the smallest index.**

Wave Reduction

These intrinsics compute the specified operation across all active lanes in the wave and broadcast the final result to all active lanes. Therefore, the final output is guaranteed uniform across the wave.

IntrinsicDescriptionPixel shaderCompute shader
WaveActiveAllEqualReturns true if the expression is the same for every active lane in the current wave (and thus uniform across it).**
WaveActiveBitAndReturns the bitwise AND of all the values of the expression across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveBitOrReturns the bitwise OR of all the values of the expression across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveBitXorReturns the bitwise Exclusive OR of all the values of the expression across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveCountBitsCounts the number of boolean variables which evaluate to true across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveMaxComputes the maximum value of the expression across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveMinComputes the minimum value of the expression across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveProductMultiplies the values of the expression together across all active lanes in the current wave, and replicates the result to all lanes in the wave.**
WaveActiveSumSums up the value of the expression across all active lanes in the current wave and replicates it to all lanes in the current wave, and replicates the result to all lanes in the wave.**

Wave Scan and Prefix

These intrinsics apply the operation to each lane and leave each partial result of the computation in the corresponding lane.

IntrinsicDescriptionPixel shaderCompute shader
WavePrefixCountBitsReturns the sum of all the specified boolean variables set to true across all active lanes with indices smaller than the current lane.**
WavePrefixSumReturns the sum of all of the values in the active lanes with smaller indices than this one.**
WavePrefixProductReturns the product of all of the values in the lanes before this one of the specified wave.**

Quad-wide Shuffle operations

These intrinsics perform swap operations on the values across a wave known to contain pixel shader quads as defined here. The indices of the pixels in the quad are defined in scan-line or reading order - where the coordinates within a quad are:

+---------> X

| [0] [1]

| [2] [3]

v

Y

These routines work in either compute shaders or pixel shaders. In compute shaders they operate in quads defined as evenly divided groups of 4 within an SIMD wave. In pixel shaders they should be used on waves captured by WaveQuadLanes, otherwise results are undefined.

IntrinsicDescriptionPixel shaderCompute shader
QuadReadLaneAtReturns the specified source value read from the lane of the current quad identified by quadLaneID [0..3] which must be uniform across the quad.*
QuadReadAcrossDiagonalReturns the specified local value which is read from the diagonally opposite lane in this quad.*
QuadReadAcrossXReturns the specified source value read from the other lane in this quad in the X direction.*
QuadReadAcrossYReturns the specified source value read from the other lane in this quad in the Y direction.*

Hardware capability

In order to check that the wave operation features are available on any specific hardware, callID3D12Device::CheckFeatureSupport, noting the description and use of theD3D12_FEATURE_DATA_D3D12_OPTIONS1 structure.

Related topics


Feedback

Was this page helpful?

YesNoNo

Need help with this topic?

Want to try using Ask Learn to clarify or guide you through this topic?

Suggest a fix?

  • Last updated on

In this article

Was this page helpful?

YesNo
NoNeed help with this topic?

Want to try using Ask Learn to clarify or guide you through this topic?

Suggest a fix?