Rate this Page

★★★★★

Torch Stable API #

The PyTorch Stable C++ API provides a convenient high level interface to callABI-stable tensor operations and other utilities commonly used in custom operators.These functions are designed to maintain binary compatibility across PyTorch versions,making them suitable for use in ahead-of-time compiled code.

For more information on the stable ABI, see theStable ABI notes.

Library Registration Macros#

These macros provide stable ABI equivalents of the standard PyTorch operatorregistration macros (TORCH_LIBRARY,TORCH_LIBRARY_IMPL, etc.).Use these when building custom operators that need to maintain binarycompatibility across PyTorch versions.

`STABLE_TORCH_LIBRARY(ns,m)`#

Defines a library of operators in a namespace using the stable ABI.

This is the stable ABI equivalent ofTORCH_LIBRARY.Use this macro to define operator schemas that will maintainbinary compatibility across PyTorch versions. Only oneSTABLE_TORCH_LIBRARYblock can exist per namespace; useSTABLE_TORCH_LIBRARY_FRAGMENT foradditional definitions in the same namespace from different translation units.

Parameters:

ns - The namespace in which to define operators (e.g.,mylib).
m - The name of the StableLibrary variable available in the block.

Example:

STABLE_TORCH_LIBRARY(mylib,m){m.def("my_op(Tensor input, int size) -> Tensor");m.def("another_op(Tensor a, Tensor b) -> Tensor");}

Minimum compatible version: PyTorch 2.9.

`STABLE_TORCH_LIBRARY_IMPL(ns,k,m)`#

Registers operator implementations for a specific dispatch key using the stable ABI.

This is the stable ABI equivalent ofTORCH_LIBRARY_IMPL. Use this macroto provide implementations of operators for a specific dispatch key (e.g.,CPU, CUDA) while maintaining binary compatibility across PyTorch versions.

Note

All kernel functions registered with this macro must be boxed usingtheTORCH_BOX macro.

Parameters:

ns - The namespace in which the operators are defined.
k - The dispatch key (e.g.,CPU,CUDA).
m - The name of the StableLibrary variable available in the block.

Example:

STABLE_TORCH_LIBRARY_IMPL(mylib,CPU,m){m.impl("my_op",TORCH_BOX(&my_cpu_kernel));}STABLE_TORCH_LIBRARY_IMPL(mylib,CUDA,m){m.impl("my_op",TORCH_BOX(&my_cuda_kernel));}

Minimum compatible version: PyTorch 2.9.

`STABLE_TORCH_LIBRARY_FRAGMENT(ns,m)`#

Extends operator definitions in an existing namespace using the stable ABI.

This is the stable ABI equivalent ofTORCH_LIBRARY_FRAGMENT. Use this macroto add additional operator definitions to a namespace that was alreadycreated withSTABLE_TORCH_LIBRARY.

Parameters:

ns - The namespace to extend.
m - The name of the StableLibrary variable available in the block.

Minimum compatible version: PyTorch 2.9.

`TORCH_BOX(&func)`#

Wraps a function to conform to the stable boxed kernel calling convention.

This macro takes an unboxed kernel function pointer and generates a boxed wrapperthat can be registered with the stable library API.

Parameters:

func - The unboxed kernel function to wrap.

Example:

Tensormy_kernel(constTensor&input,int64_tsize){returninput.reshape({size});}STABLE_TORCH_LIBRARY_IMPL(my_namespace,CPU,m){m.impl("my_op",TORCH_BOX(&my_kernel));}

Minimum compatible version: PyTorch 2.9.

Tensor Class#

Thetorch::stable::Tensor class offers a user-friendly C++ interface similartotorch::Tensor while maintaining binary compatibility across PyTorch versions.

classTensor#

An ABI stable wrapper around PyTorch tensors.

This class is modeled after TensorBase, as custom op kernels primarily need to interact withTensor metadata (sizes, strides, device, dtype). Other tensor operations (likeempty_like) exist as standalone functions outside of this struct.

Minimum compatible version: PyTorch 2.9.

Public Functions

inlineTensor()#

Constructs aTensor with an uninitialized AtenTensorHandle.

Creates a newstable::Tensor by allocating an uninitialized tensor handle. The ownership of the handle is managed internally via shared_ptr.

Minimum compatible version: PyTorch 2.9.

inlineexplicitTensor(AtenTensorHandleath)#

Constructs aTensor from an existing AtenTensorHandle.

Steals ownership of the provided AtenTensorHandle.

Minimum compatible version: PyTorch 2.9.

Parameters:: ath – The AtenTensorHandle to wrap. Ownership is transferred to thisTensor.

inlineAtenTensorHandleget()const#

Returns a borrowed reference to the underlying AtenTensorHandle.

Minimum compatible version: PyTorch 2.9.

Returns:: The underlying AtenTensorHandle.

inlinevoid*data_ptr()const#

Returns a pointer to the tensor’s data.

Minimum compatible version: PyTorch 2.9.

Returns:: A void pointer to the tensor’s data storage.

inlinevoid*mutable_data_ptr()const#

Returns a mutable pointer to the tensor’s data.

Minimum compatible version: PyTorch 2.10.

Returns:: A mutable void pointer to the tensor’s data storage.

inlineconstvoid*const_data_ptr()const#

Returns a const pointer to the tensor’s data.

Minimum compatible version: PyTorch 2.10.

Returns:: A const void pointer to the tensor’s data storage.

template<typenameT> T*mutable_data_ptr()const#

Returns a typed mutable pointer to the tensor’s data.

Minimum compatible version: PyTorch 2.10.

Template Parameters:: T – The type to cast the data pointer to.
Returns:: A mutable pointer to the tensor’s data cast to type T*.

template<typenameT,std::enable_if_t<!std::is_const_v<T>,int>=0> constT*const_data_ptr()const#

Returns a typed const pointer to the tensor’s data.

Minimum compatible version: PyTorch 2.10.

Template Parameters:: T – The type to cast the data pointer to. Must not be const-qualified.
Returns:: A const pointer to the tensor’s data cast to type const T*.

inlineconstTensor&set_requires_grad(boolrequires_grad)const#

Sets whether this tensor requires gradient computation.

Minimum compatible version: PyTorch 2.10.

Parameters:: requires_grad – If true, gradients will be computed for this tensor during backpropagation.
Returns:: A reference to thisTensor.

inlineint64_tdim()const#

Returns the number of dimensions of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The number of dimensions (rank) of the tensor.

inlineint64_tnumel()const#

Returns the total number of elements in the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The total number of elements across all dimensions.

inlineIntHeaderOnlyArrayRefsizes()const#

Returns the sizes (shape) of the tensor.

Returns a borrowed reference of the dimension sizes of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: An IntHeaderOnlyArrayRef containing the size of each dimension.

inlineIntHeaderOnlyArrayRefstrides()const#

Returns the strides of the tensor.

Returns a borrowed reference of the strides of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: An IntHeaderOnlyArrayRef containing the stride of each dimension.

inlineboolis_contiguous()const#

Checks if the tensor is contiguous in memory.

Minimum compatible version: PyTorch 2.9.

Note

This is a subset of the original TensorBase API. It takes no arguments whereas the original API takes a memory format argument. Here, we assume the default contiguous memory format.

Returns:: true if the tensor is contiguous, false otherwise.

inlineint64_tstride(int64_tdim)const#

Returns the stride of a specific dimension.

Minimum compatible version: PyTorch 2.9.

Parameters:: dim – The dimension index to query.
Returns:: The stride of the specified dimension.

inlineDeviceIndexget_device_index()const#

Returns the device index of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The device index as DeviceIndex (int32_t).

inlineboolis_cuda()const#

Checks if the tensor is on a CUDA device.

Minimum compatible version: PyTorch 2.9.

Returns:: true if the tensor is on a CUDA device, false otherwise.

inlineboolis_cpu()const#

Checks if the tensor is on the CPU.

Minimum compatible version: PyTorch 2.9.

Returns:: true if the tensor is on the CPU, false otherwise.

inlineint64_tsize(int64_tdim)const#

Returns the size of a specific dimension.

Minimum compatible version: PyTorch 2.9.

Parameters:: dim – The dimension index to query.
Returns:: The size of the specified dimension.

inlinebooldefined()const#

Checks if the tensor is defined (not null).

Minimum compatible version: PyTorch 2.9.

Returns:: true if the tensor is defined, false otherwise.

inlineint64_tstorage_offset()const#

Returns the storage offset of the tensor.

The storage offset is the number of elements from the beginning of the underlying storage to the first element of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The storage offset in number of elements.

inlinesize_telement_size()const#

Returns the size in bytes of each element in the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The element size in bytes.

ScalarTypescalar_type()const#

Returns the scalar type (dtype) of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The ScalarType of the tensor.

Devicedevice()const#

Returns the device of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: TheDevice on which the tensor resides.

Layoutlayout()const#

Returns the layout of the tensor.

Minimum compatible version: PyTorch 2.9.

Returns:: The Layout of the tensor (e.g., Strided, Sparse).

Device Class#

Thetorch::stable::Device class provides a user-friendly C++ interface similartoc10::Device while maintaining binary compatibility across PyTorch versions.It represents a compute device (CPU, CUDA, etc.) with an optional device index.

classDevice#

A stable version ofc10::Device.

Minimum compatible version: PyTorch 2.9.

Public Functions

inlineDevice(DeviceTypetype,DeviceIndexindex=-1)#

Constructs aDevice from a DeviceType and optional device index.

Minimum compatible version: PyTorch 2.9.

Parameters:

type – The type of device (e.g., DeviceType::CPU, DeviceType::CUDA).
index – The device index. Default is -1 (current device).

Device(conststd::string&device_string)#

Constructs astable::Device from a string description.

The string must follow the schema: (cpu|cuda|…)[:<device-index>]

Minimum compatible version: PyTorch 2.10.

Parameters:: device_string – A string describing the device (e.g., “cuda:0”, “cpu”).

inlinebooloperator==(constDevice&other)constnoexcept#

Checks if two devices are equal.

Minimum compatible version: PyTorch 2.9.

Parameters:: other – The device to compare with.
Returns:: true if both type and index match, false otherwise.

inlinebooloperator!=(constDevice&other)constnoexcept#

Checks if two devices are not equal.

Minimum compatible version: PyTorch 2.9.

Parameters:: other – The device to compare with.
Returns:: true if type or index differ, false otherwise.

inlinevoidset_index(DeviceIndexindex)#

Sets the device index.

Minimum compatible version: PyTorch 2.9.

Parameters:: index – The new device index.

inlineDeviceTypetype()constnoexcept#

Returns the device type.

Minimum compatible version: PyTorch 2.9.

Returns:: The DeviceType of this device.

inlineDeviceIndexindex()constnoexcept#

Returns the device index.

Minimum compatible version: PyTorch 2.9.

Returns:: The device index, or -1 if no specific index is set.

inlineboolhas_index()constnoexcept#

Checks if this device has a specific index.

Minimum compatible version: PyTorch 2.9.

Returns:: true if index is not -1, false otherwise.

inlineboolis_cuda()constnoexcept#

Checks if this is a CUDA device.

Minimum compatible version: PyTorch 2.9.

Returns:: true if the device type is CUDA, false otherwise.

inlineboolis_cpu()constnoexcept#

Checks if this is a CPU device.

Minimum compatible version: PyTorch 2.9.

Returns:: true if the device type is CPU, false otherwise.

DeviceGuard Class#

Thetorch::stable::accelerator::DeviceGuard provides a user-friendly C++interface similar toc10::DeviceGuard while maintaining binary compatibilityacross PyTorch versions.

classDeviceGuard#

A stable ABI version of c10::DeviceGuard.

RAII class that sets the current device to the specified device index on construction and restores the previous device on destruction.

Minimum compatible version: PyTorch 2.9.

Public Functions

inlineexplicitDeviceGuard(DeviceIndexdevice_index)#

Constructs aDeviceGuard that sets the current device.

Minimum compatible version: PyTorch 2.9.

Parameters:: device_index – The device index to set as the current device.

inlinevoidset_index(DeviceIndexdevice_index)#

Changes the current device to the specified device index.

Minimum compatible version: PyTorch 2.9.

Parameters:: device_index – The new device index to set.

inlineDeviceIndextorch::stable::accelerator::getCurrentDeviceIndex()#

Gets the current device index.

Returns the index of the currently active device for the accelerator.

Minimum compatible version: PyTorch 2.9.

Returns:: The current device index.

Stream Utilities#

For CUDA stream access, we currently recommend the ABI stable C shim API. Thiswill be improved in a future release with a more ergonomic wrapper.

Getting the Current CUDA Stream#

To obtain the currentcudaStream_t for use in CUDA kernels:

#include<torch/csrc/inductor/aoti_torch/c/shim.h>#include<torch/headeronly/util/shim_utils.h>// For now, we rely on the ABI stable C shim API to get the current CUDA stream.// This will be improved in a future release.// When using a C shim API, we need to use TORCH_ERROR_CODE_CHECK to// check the error code and throw an appropriate runtime_error otherwise.void*stream_ptr=nullptr;TORCH_ERROR_CODE_CHECK(aoti_torch_get_current_cuda_stream(tensor.get_device_index(),&stream_ptr));cudaStream_tstream=static_cast<cudaStream_t>(stream_ptr);// Now you can use 'stream' in your CUDA kernel launchesmy_kernel<<<blocks,threads,0,stream>>>(args...);

Note

TheTORCH_ERROR_CODE_CHECK macro is required when using C shim APIsto properly check error codes and throw appropriate exceptions.

CUDA Error Checking Macros#

These macros provide stable ABI equivalents for CUDA error checking.They wrap CUDA API calls and kernel launches, providing detailed errormessages using PyTorch’s error formatting.

`STD_CUDA_CHECK(EXPR)`#

Checks the result of a CUDA API call and throws an exception on error.Users of this macro are expected to includecuda_runtime.h.

Example:

STD_CUDA_CHECK(cudaMalloc(&ptr,size));STD_CUDA_CHECK(cudaMemcpy(dst,src,size,cudaMemcpyDeviceToHost));

Minimum compatible version: PyTorch 2.10.

`STD_CUDA_KERNEL_LAUNCH_CHECK()`#

Checks for errors from the most recent CUDA kernel launch. Equivalent toSTD_CUDA_CHECK(cudaGetLastError()).

Example:

my_kernel<<<blocks,threads,0,stream>>>(args...);STD_CUDA_KERNEL_LAUNCH_CHECK();

Minimum compatible version: PyTorch 2.10.

Header-Only Utilities#

Thetorch::headeronly namespace provides header-only versions of commonPyTorch types and utilities. These can be used without linking against libtorch,making them ideal for maintaining binary compatibility across PyTorch versions.

Error Checking#

STD_TORCH_CHECK is a header-only macro for runtime assertions:

#include<torch/headeronly/util/Exception.h>STD_TORCH_CHECK(condition,"Error message with ",variable," interpolation");

Core Types#

The followingc10:: types are available as header-only versions undertorch::headeronly:::

torch::headeronly::ScalarType - Tensor data types (Float, Double, Int, etc.)
torch::headeronly::DeviceType - Device types (CPU, CUDA, etc.)
torch::headeronly::MemoryFormat - Memory layout formats (Contiguous, ChannelsLast, etc.)
torch::headeronly::Layout - Tensor layouts (Strided, Sparse, etc.)

#include<torch/headeronly/core/ScalarType.h>#include<torch/headeronly/core/DeviceType.h>#include<torch/headeronly/core/MemoryFormat.h>#include<torch/headeronly/core/Layout.h>autodtype=torch::headeronly::ScalarType::Float;autodevice_type=torch::headeronly::DeviceType::CUDA;automemory_format=torch::headeronly::MemoryFormat::Contiguous;autolayout=torch::headeronly::Layout::Strided;

TensorAccessor#

TensorAccessor provides efficient, bounds-checked access to tensor data.You can construct one from a stable tensor’s data pointer, sizes, and strides:

#include<torch/headeronly/core/TensorAccessor.h>// Create a TensorAccessor for a 2D float tensorautosizes=tensor.sizes();autostrides=tensor.strides();torch::headeronly::TensorAccessor<float,2>accessor(static_cast<float*>(tensor.mutable_data_ptr()),sizes.data(),strides.data());// Access elementsfloatvalue=accessor[i][j];

Dispatch Macros#

Header-only dispatch macros (THO = Torch Header Only) are available fordtype and device dispatching:

#include<torch/headeronly/core/Dispatch.h>THO_DISPATCH_FLOATING_TYPES(tensor.scalar_type(),"my_kernel",[&]{// scalar_t is the resolved typeauto*data=tensor.data_ptr<scalar_t>();});

Full API List#

For the complete list of header-only APIs, seetorch/header_only_apis.txtin the PyTorch source tree.

Stable Operators#

Tensor Creation#

inlinetorch::stable::Tensortorch::stable::empty(torch::headeronly::IntHeaderOnlyArrayRefsize,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt,std::optional<torch::headeronly::Layout>layout=std::nullopt,std::optional<torch::stable::Device>device=std::nullopt,std::optional<bool>pin_memory=std::nullopt,std::optional<torch::headeronly::MemoryFormat>memory_format=std::nullopt)#

Stable version of the empty.memory_format op.

Creates a new uninitialized tensor with the specified size and options. This function supports full tensor creation options including device, dtype, layout, and memory format.

Minimum compatible version: PyTorch 2.10.

Parameters:

size – The desired size of the output tensor.
dtype – Optional scalar type for the tensor elements.
layout – Optional memory layout (e.g., strided, sparse).
device – Optional device to place the tensor on.
pin_memory – Optional flag to use pinned memory (for CUDA tensors).
memory_format – Optional memory format for the tensor.

Returns:

A new uninitialized tensor with the specified properties.

inlinetorch::stable::Tensortorch::stable::empty_like(consttorch::stable::Tensor&self)#

Stable version of the empty_like op.

Creates a new uninitialized tensor with the same size, dtype, layout, and device as the input tensor. This version does not support kwargs (device, dtype, layout, memory_format) - kwargs support may be added in the future.

Minimum compatible version: PyTorch 2.9.

Parameters:: self – The input tensor whose properties will be used for the new tensor.
Returns:: A new uninitialized tensor with the same properties as self.

inlinetorch::stable::Tensortorch::stable::new_empty(consttorch::stable::Tensor&self,torch::headeronly::IntHeaderOnlyArrayRefsize,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt,std::optional<torch::headeronly::Layout>layout=std::nullopt,std::optional<torch::stable::Device>device=std::nullopt,std::optional<bool>pin_memory=std::nullopt)#

Stable version of the new_empty op (2.10 version with full kwargs).

Creates a new uninitialized tensor with the specified size and options. This version supports all tensor creation kwargs. For versions < 2.10, a simpler overload that only takes dtype is available.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor whose properties may be inherited if kwargs are not provided.
size – The desired size of the output tensor.
dtype – Optional scalar type for the tensor elements.
layout – Optional memory layout (e.g., strided, sparse).
device – Optional device to place the tensor on.
pin_memory – Optional flag to use pinned memory (for CUDA tensors).

Returns:

A new uninitialized tensor with the specified properties.

inlinetorch::stable::Tensortorch::stable::new_zeros(consttorch::stable::Tensor&self,torch::headeronly::IntHeaderOnlyArrayRefsize,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt,std::optional<torch::headeronly::Layout>layout=std::nullopt,std::optional<torch::stable::Device>device=std::nullopt,std::optional<bool>pin_memory=std::nullopt)#

Stable version of the new_zeros op (2.10 version with full kwargs).

Creates a new zero-filled tensor with the specified size and options. This version supports all tensor creation kwargs. For versions < 2.10, a simpler overload that only takes dtype is available.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor whose properties may be inherited if kwargs are not provided.
size – The desired size of the output tensor.
dtype – Optional scalar type for the tensor elements.
layout – Optional memory layout (e.g., strided, sparse).
device – Optional device to place the tensor on.
pin_memory – Optional flag to use pinned memory (for CUDA tensors).

Returns:

A new zero-filled tensor with the specified properties.

inlinetorch::stable::Tensortorch::stable::full(torch::headeronly::IntHeaderOnlyArrayRefsize,doublefill_value,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt,std::optional<torch::headeronly::Layout>layout=std::nullopt,std::optional<torch::stable::Device>device=std::nullopt,std::optional<bool>pin_memory=std::nullopt)#

Stable version of the full.default op.

Creates a tensor of the specified size filled with the given value.

Minimum compatible version: PyTorch 2.10.

Note

The fill_value parameter is typed C shim API uses double for the Scalar parameter.

Parameters:

size – The desired size of the output tensor.
fill_value – The value to fill the tensor with.
dtype – Optional scalar type for the tensor elements.
layout – Optional memory layout.
device – Optional device to place the tensor on.
pin_memory – Optional flag to use pinned memory.

Returns:

A new tensor filled with the specified value.

Warning

doxygenfunction: Unable to resolve function “torch::stable::from_blob” with arguments None in doxygen xml output for project “PyTorch” from directory: /var/lib/jenkins/workspace/docs/cpp/build/xml.Potential matches:

-torch::stable::Tensorfrom_blob(void*data,torch::headeronly::IntHeaderOnlyArrayRefsizes,torch::headeronly::IntHeaderOnlyArrayRefstrides,torch::stable::Devicedevice,torch::headeronly::ScalarTypedtype,DeleterFnPtrdeleter,int64_tstorage_offset=0,torch::headeronly::Layoutlayout=torch::headeronly::Layout::Strided)-torch::stable::Tensorfrom_blob(void*data,torch::headeronly::IntHeaderOnlyArrayRefsizes,torch::headeronly::IntHeaderOnlyArrayRefstrides,torch::stable::Devicedevice,torch::headeronly::ScalarTypedtype,int64_tstorage_offset=0,torch::headeronly::Layoutlayout=torch::headeronly::Layout::Strided)

Tensor Manipulation#

inlinetorch::stable::Tensortorch::stable::clone(consttorch::stable::Tensor&self)#

Stable version of the clone op.

Returns a copy of the input tensor. The returned tensor has the same data and type as the input, but is stored in a new memory location.

Minimum compatible version: PyTorch 2.9.

Note

Optional memory_format kwarg support

Parameters:: self – The input tensor to clone.
Returns:: A new tensor with copied data.

inlinetorch::stable::Tensortorch::stable::contiguous(consttorch::stable::Tensor&self,torch::headeronly::MemoryFormatmemory_format=torch::headeronly::MemoryFormat::Contiguous)#

Stable version of the contiguous op.

Returns a contiguous in memory tensor containing the same data as the input tensor. If the input tensor is already contiguous in the specified memory format, the input tensor is returned.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor.
memory_format – The desired memory format.

Returns:

A contiguous tensor.

inlinetorch::stable::Tensortorch::stable::reshape(consttorch::stable::Tensor&self,torch::headeronly::IntHeaderOnlyArrayRefshape)#

Stable version of the reshape op.

Returns a tensor with the same data and number of elements as the input, but with the specified shape. When possible, the returned tensor will be a view of the input.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor.
shape – The desired output shape.

Returns:

A tensor with the specified shape.

inlinetorch::stable::Tensortorch::stable::view(consttorch::stable::Tensor&self,torch::headeronly::IntHeaderOnlyArrayRefsize)#

Stable version of the view op.

Returns a new tensor with the same data as the input tensor but with a different shape. The returned tensor shares the same data and must have the same number of elements.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor.
size – The desired output shape.

Returns:

A view tensor with the specified shape.

inlinetorch::stable::Tensortorch::stable::flatten(consttorch::stable::Tensor&self,int64_tstart_dim=0,int64_tend_dim=-1)#

Stable version of the flatten.using_ints op.

Flattens the input tensor by reshaping it into a one-dimensional tensor. If start_dim or end_dim are specified, only dimensions starting from start_dim to end_dim are flattened.

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The input tensor to flatten.
start_dim – The first dimension to flatten. Defaults to 0.
end_dim – The last dimension to flatten. Defaults to -1 (last dim).

Returns:

A flattened tensor.

inlinetorch::stable::Tensortorch::stable::squeeze(consttorch::stable::Tensor&self,int64_tdim)#

Stable version of the squeeze.dim op.

Returns a tensor with the dimension of size one at the specified position removed. The returned tensor shares the same underlying data with the input tensor.

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The input tensor.
dim – The dimension to squeeze. the tensor is returned unchanged.

Returns:

A tensor with the specified dimension removed (if size was 1).

inlinetorch::stable::Tensortorch::stable::unsqueeze(consttorch::stable::Tensor&self,int64_tdim)#

Stable version of the unsqueeze op.

Returns a new tensor with a dimension of size one inserted at the specified position. The returned tensor shares the same underlying data with the input tensor.

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The input tensor.
dim – The index at which to insert values are supported.

Returns:

A tensor with an additional dimension.

inlinetorch::stable::Tensortorch::stable::transpose(consttorch::stable::Tensor&self,int64_tdim0,int64_tdim1)#

Stable version of the transpose.int op.

Returns a tensor that is a transposed version of the input, with dimensions dim0 and dim1 swapped. The returned tensor shares storage with the input.

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The input tensor.
dim0 – The first dimension to transpose.
dim1 – The second dimension to transpose.

Returns:

A transposed view of the input tensor.

inlinetorch::stable::Tensortorch::stable::select(consttorch::stable::Tensor&self,int64_tdim,int64_tindex)#

Stable version of the select.int op.

Slices the input tensor along the specified dimension at the given index. This function returns a view of the original tensor with the given dimension removed.

Minimum compatible version: PyTorch 2.9.

Note

The index parameter is typed header-only.

Parameters:

self – The input tensor.
dim – The dimension to slice.
index – The index to select along the dimension.

Returns:

A tensor with one fewer dimension.

inlinetorch::stable::Tensortorch::stable::narrow(torch::stable::Tensor&self,int64_tdim,int64_tstart,int64_tlength)#

Stable version of the narrow.default op.

Returns a new tensor that is a narrowed version of the input tensor. The dimension dim is narrowed from start to start + length.

Minimum compatible version: PyTorch 2.9.

Note

The start and length parameters is not yet header-only.

Parameters:

self – The input tensor to narrow.
dim – The dimension along which to narrow.
start – The starting index for the narrowed dimension.
length – The length of the narrowed dimension.

Returns:

A new tensor that is a narrowed view of the input.

inlinetorch::stable::Tensortorch::stable::pad(consttorch::stable::Tensor&self,torch::headeronly::IntHeaderOnlyArrayRefpad,conststd::string&mode="constant",doublevalue=0.0)#

Stable version of the pad.default op.

Pads the input tensor according to the specified padding sizes. The padding is applied symmetrically to each dimension, with the padding sizes specified in reverse order (last dimension first).

Minimum compatible version: PyTorch 2.9.

Note

The pad parameter is typed not yet header-only.

Parameters:

self – The input tensor to pad.
pad – The padding sizes for each dimension (in pairs, starting from the last dimension).
mode – The padding mode: “constant”, “reflect”, “replicate”, or “circular”. Defaults to “constant”.
value – The fill value for constant padding. Defaults to 0.0.

Returns:

A new padded tensor.

Device and Type Conversion#

inlinetorch::stable::Tensortorch::stable::to(consttorch::stable::Tensor&self,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt,std::optional<torch::headeronly::Layout>layout=std::nullopt,std::optional<torch::stable::Device>device=std::nullopt,std::optional<bool>pin_memory=std::nullopt,boolnon_blocking=false,boolcopy=false,std::optional<torch::headeronly::MemoryFormat>memory_format=std::nullopt)#

Stable version of the to.dtype_layout op.

Converts a tensor to the specified dtype, layout, device, and/or memory format. Returns a new tensor with the specified properties.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor.
dtype – Optional target scalar type.
layout – Optional target memory layout.
device – Optional target device.
pin_memory – Optional flag to use pinned memory.
non_blocking – If true, the operation may be asynchronous. Defaults to false.
copy – If true, always create a copy. Defaults to false.
memory_format – Optional target memory format.

Returns:

A tensor with the specified properties.

inlinetorch::stable::Tensortorch::stable::to(consttorch::stable::Tensor&self,torch::stable::Devicedevice,boolnon_blocking=false,boolcopy=false)#

Convenience overload for moving a tensor to a device.

Moves the tensor to the specified device. This is a convenience wrapper around the fullto() function.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor.
device – The target device.
non_blocking – If true, the operation may be asynchronous. Defaults to false.
copy – If true, always create a copy. Defaults to false.

Returns:

A tensor on the specified device.

inlinetorch::stable::Tensortorch::stable::fill_(consttorch::stable::Tensor&self,doublevalue)#

Stable version of the fill_.Scalar op.

Fills the input tensor with the specified scalar value in-place and returns it. This has identical semantics to the existing fill_.Scalar op.

Minimum compatible version: PyTorch 2.9.

Note

The value parameter is typed as double This is because Scalar.h is currently not header-only.

Parameters:

self – The tensor to fill.
value – The scalar value to fill the tensor with.

Returns:

The input tensor, now filled with the specified value.

inlinetorch::stable::Tensortorch::stable::zero_(torch::stable::Tensor&self)#

Stable version of the zero_ op.

Fills the input tensor with zeros in-place and returns it. Unlike the tensor method version (t.zero_()), this is called as a function: zero_(t).

Minimum compatible version: PyTorch 2.9.

Parameters:: self – The tensor to fill with zeros.
Returns:: The input tensor, now filled with zeros.

inlinetorch::stable::Tensortorch::stable::copy_(torch::stable::Tensor&self,consttorch::stable::Tensor&src,std::optional<bool>non_blocking=std::nullopt)#

Stable version of the copy_ op.

Copies the elements from the source tensor into the destination tensor in-place and returns the destination tensor. The tensors must be broadcastable.

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The destination tensor (modified in-place).
src – The source tensor to copy from.
non_blocking – If true, the copy may occur asynchronously with respect to the host. Defaults to false.

Returns:

The destination tensor with copied values.

inlinetorch::stable::Tensortorch::stable::matmul(consttorch::stable::Tensor&self,consttorch::stable::Tensor&other)#

Stable version of the matmul op.

Performs matrix multiplication between two tensors. The behavior depends on the dimensionality of the tensors (see PyTorch documentation for details on broadcasting rules for matmul).

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The first input tensor.
other – The second input tensor.

Returns:

The result of matrix multiplication.

inlinetorch::stable::Tensortorch::stable::amax(consttorch::stable::Tensor&self,int64_tdim,boolkeepdim=false)#

Stable version of the amax.default op (single dimension).

Computes the maximum value along the specified dimension. If keepdim is true, the output tensor has the same number of dimensions as the input, with the reduced dimension having size 1. Otherwise, the reduced dimension is removed.

Minimum compatible version: PyTorch 2.9.

Parameters:

self – The input tensor.
dim – The dimension along which to compute the maximum.
keepdim – Whether to retain

Returns:

A tensor containing the maximum values along the specified dimension.

inlinetorch::stable::Tensortorch::stable::amax(consttorch::stable::Tensor&self,torch::headeronly::IntHeaderOnlyArrayRefdims,boolkeepdim=false)#

Stable version of the amax.default op (multiple dimensions).

Computes the maximum value reducing over all the specified dimensions. If keepdim is true, the output tensor has the same number of dimensions as the input, with the reduced dimensions having size 1. Otherwise, the reduced dimensions are removed.

Minimum compatible version: PyTorch 2.9.

Note

The dims parameter is typed is not yet header-only.

Parameters:

self – The input tensor.
dims – The dimensions along which to compute the maximum.
keepdim – Whether to retain the reduced dimensions. Defaults to false.

Returns:

A tensor containing the maximum values.

inlinetorch::stable::Tensortorch::stable::sum(consttorch::stable::Tensor&self,std::optional<torch::headeronly::IntHeaderOnlyArrayRef>dim=std::nullopt,boolkeepdim=false,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt)#

Stable version of the sum.dim_IntList op.

Computes the sum of the input tensor along the specified dimensions. If dim is not provided, sums over all dimensions.

Minimum compatible version: PyTorch 2.10.

Parameters:

self – The input tensor.
dim – Optional dimensions to reduce. If not provided, reduces all dimensions.
keepdim – Whether to retain the reduced dimensions. Defaults to false.
dtype – Optional output dtype. If not provided, uses the input dtype.

Returns:

A tensor containing the sum.

inlinetorch::stable::Tensor&torch::stable::sum_out(torch::stable::Tensor&out,consttorch::stable::Tensor&self,std::optional<torch::headeronly::IntHeaderOnlyArrayRef>dim=std::nullopt,boolkeepdim=false,std::optional<torch::headeronly::ScalarType>dtype=std::nullopt)#

Stable version of the sum.IntList_out op.

Computes the sum of the input tensor along the specified dimensions, storing the result in the provided output tensor. Following C++ convention, the out parameter comes first.

Minimum compatible version: PyTorch 2.10.

Parameters:

out – The output tensor (modified in-place).
self – The input tensor.
dim – Optional dimensions to reduce.
keepdim – Whether to retain the reduced dimensions. Defaults to false.
dtype – Optional output dtype.

Returns:

Reference to the output tensor.

inlinetorch::stable::Tensortorch::stable::subtract(consttorch::stable::Tensor&self,consttorch::stable::Tensor&other,doublealpha=1.0)#

Stable version of the subtract.Tensor op.

Subtracts the other tensor from self, with an optional scaling factor alpha. Computes: self - alpha * other.

Minimum compatible version: PyTorch 2.10.

Note

The alpha parameter is typed as double API uses double for the Scalar parameter.

Parameters:

self – The input tensor.
other – The tensor to subtract.
alpha – The scaling factor for other. Defaults to 1.0.

Returns:

The result of self - alpha * other.

template<classF> inlinevoidtorch::stable::parallel_for(constint64_tbegin,constint64_tend,constint64_tgrain_size,constF&f)#

Stable parallel_for utility.

Provides a stable interface to at::parallel_for for parallel execution. The function f will be called with (begin, end) ranges to process in parallel. grain_size controls the minimum work size per thread for efficient parallelization.

Minimum compatible version: PyTorch 2.10.

Template Parameters:

F – The callable type

Parameters:

begin – The start of the iteration range.
end – The end of the iteration range (exclusive).
grain_size – The minimum number of iterations per thread.
f – The function to execute in parallel.

inlineuint32_ttorch::stable::get_num_threads()#

Gets the number of threads for the parallel backend.

Provides a stable interface to at::get_num_threads.

Minimum compatible version: PyTorch 2.10.

Returns:: The number of threads

Parallelization Utilities#

template<classF> inlinevoidtorch::stable::parallel_for(constint64_tbegin,constint64_tend,constint64_tgrain_size,constF&f)

Stable parallel_for utility.

Minimum compatible version: PyTorch 2.10.

Template Parameters:

F – The callable type

Parameters:

begin – The start of the iteration range.
end – The end of the iteration range (exclusive).
grain_size – The minimum number of iterations per thread.
f – The function to execute in parallel.

inlineuint32_ttorch::stable::get_num_threads()

Gets the number of threads for the parallel backend.

Provides a stable interface to at::get_num_threads.

Minimum compatible version: PyTorch 2.10.

Returns:: The number of threads

On this page

Show Source

PyTorch Libraries

Movatterモバイル変換

Torch Stable API #

Library Registration Macros#

`STABLE_TORCH_LIBRARY(ns,m)`#

`STABLE_TORCH_LIBRARY_IMPL(ns,k,m)`#

`STABLE_TORCH_LIBRARY_FRAGMENT(ns,m)`#

`TORCH_BOX(&func)`#

Tensor Class#

Device Class#

DeviceGuard Class#

Stream Utilities#

Getting the Current CUDA Stream#

CUDA Error Checking Macros#

`STD_CUDA_CHECK(EXPR)`#

`STD_CUDA_KERNEL_LAUNCH_CHECK()`#

Header-Only Utilities#

Error Checking#

Core Types#

TensorAccessor#

Dispatch Macros#

Full API List#

Stable Operators#

Tensor Creation#

Tensor Manipulation#

Device and Type Conversion#

Parallelization Utilities#

Docs

Tutorials

Resources

Movatterモバイル変換

Torch Stable API#

Library Registration Macros#

STABLE_TORCH_LIBRARY(ns,m)#

STABLE_TORCH_LIBRARY_IMPL(ns,k,m)#

STABLE_TORCH_LIBRARY_FRAGMENT(ns,m)#

TORCH_BOX(&func)#

Tensor Class#

Device Class#

DeviceGuard Class#

Stream Utilities#

Getting the Current CUDA Stream#

CUDA Error Checking Macros#

STD_CUDA_CHECK(EXPR)#

STD_CUDA_KERNEL_LAUNCH_CHECK()#

Header-Only Utilities#

Error Checking#

Core Types#

TensorAccessor#

Dispatch Macros#

Full API List#

Stable Operators#

Tensor Creation#

Tensor Manipulation#

Device and Type Conversion#

Parallelization Utilities#

Docs

Tutorials

Resources

Torch Stable API #

`STABLE_TORCH_LIBRARY(ns,m)`#

`STABLE_TORCH_LIBRARY_IMPL(ns,k,m)`#

`STABLE_TORCH_LIBRARY_FRAGMENT(ns,m)`#

`TORCH_BOX(&func)`#

`STD_CUDA_CHECK(EXPR)`#

`STD_CUDA_KERNEL_LAUNCH_CHECK()`#