torch.cuda.memory.memory_stats#
- torch.cuda.memory.memory_stats(device=None)[source]#
Return a dictionary of CUDA memory allocator statistics for a given device.
The return value of this function is a dictionary of statistics, each ofwhich is a non-negative integer.
Core statistics:
"allocated.{all,large_pool,small_pool}.{current,peak,allocated,freed}":number of allocation requests received by the memory allocator."allocated_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":amount of allocated memory."segment.{all,large_pool,small_pool}.{current,peak,allocated,freed}":number of reserved segments fromcudaMalloc()."reserved_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":amount of reserved memory."active.{all,large_pool,small_pool}.{current,peak,allocated,freed}":number of active memory blocks."active_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":amount of active memory."inactive_split.{all,large_pool,small_pool}.{current,peak,allocated,freed}":number of inactive, non-releasable memory blocks."inactive_split_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":amount of inactive, non-releasable memory.
For these core statistics, values are broken down as follows.
Pool type:
all: combined statistics across all memory pools.large_pool: statistics for the large allocation pool(as of June 2025, for size >= 1MB allocations).small_pool: statistics for the small allocation pool(as of June 2025, for size < 1MB allocations).
Metric type:
current: current value of this metric.peak: maximum value of this metric.allocated: historical total increase in this metric.freed: historical total decrease in this metric.
In addition to the core statistics, we also provide some simple eventcounters:
"num_alloc_retries": number of failedcudaMalloccalls thatresult in a cache flush and retry."num_ooms": number of out-of-memory errors thrown."num_sync_all_streams": number ofsynchronize_and_free_eventscalls."num_device_alloc": number of CUDA allocation calls. This includes bothcuMemMap and cudaMalloc."num_device_free": number of CUDA free calls. This includes both cuMemUnmapand cudaFree.
The caching allocator can be configured via ENV to not split blocks larger than adefined size (see Memory Management section of the Cuda Semantics documentation).This helps avoid memory fragmentation but may have a performancepenalty. Additional outputs to assist with tuning and evaluating impact:
"max_split_size": blocks above this size will not be split."oversize_allocations.{current,peak,allocated,freed}":number of over-size allocation requests received by the memory allocator."oversize_segments.{current,peak,allocated,freed}":number of over-size reserved segments fromcudaMalloc().
The caching allocator can be configured via ENV to round memory allocations in orderto reduce fragmentation. Sometimes the overhead from rounding can be higher thanthe fragmentation it helps reduce. The following stat can be used to check ifrounding adds too much overhead:
"requested_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":memory requested by client code, compare this with allocated_bytes to check ifallocation rounding adds too much overhead.
- Parameters
device (torch.device orint,optional) – selected device. Returnsstatistics for the current device, given by
current_device(),ifdeviceisNone(default).- Return type
Note
SeeMemory management for more details about GPU memorymanagement.
Note
Withbackend:cudaMallocAsync, some stats are notmeaningful, and are always reported as zero.