torch.monitor#
Created On: Jan 12, 2022 | Last Updated On: Jun 11, 2025
Warning
This module is a prototype release, and its interfaces and functionality maychange without warning in future PyTorch releases.
torch.monitor provides an interface for logging events and counters fromPyTorch.
The stat interfaces are designed to be used for tracking high level metrics thatare periodically logged out to be used for monitoring system performance. Sincethe stats aggregate with a specific window size you can log to them fromcritical loops with minimal performance impact.
For more infrequent events or values such as loss, accuracy, usage tracking theevent interface can be directly used.
Event handlers can be registered to handle the events and pass them to anexternal event sink.
API Reference#
- classtorch.monitor.Aggregation#
These are types of aggregations that can be used to accumulate stats.
Members:
- VALUE :
VALUE returns the last value to be added.
- MEAN :
MEAN computes the arithmetic mean of all the added values.
- COUNT :
COUNT returns the total number of added values.
- SUM :
SUM returns the sum of the added values.
- MAX :
MAX returns the max of the added values.
- MIN :
MIN returns the min of the added values.
- propertyname#
- classtorch.monitor.Stat#
Stat is used to compute summary statistics in a performant way overfixed intervals. Stat logs the statistics as an Event once every
window_sizeduration. When the window closes the stats are loggedvia the event handlers as atorch.monitor.Statevent.window_sizeshould be set to something relatively high to avoid ahuge number of events being logged. Ex: 60s. Stat uses millisecondprecision.If
max_samplesis set, the stat will cap the number of samples perwindow by discardingadd calls oncemax_samplesadds haveoccurred. If it’s not set, alladdcalls during the window will beincluded. This is an optional field to make aggregations more directlycomparable across windows when the number of samples might vary.When the Stat is destructed it will log any remaining data even if thewindow hasn’t elapsed.
- __init__(self:torch._C._monitor.Stat,name:str,aggregations:collections.abc.Sequence[torch._C._monitor.Aggregation],window_size:datetime.timedelta,max_samples:SupportsInt=9223372036854775807)→None#
Constructs the
Stat.
- add(self:torch._C._monitor.Stat,v:SupportsFloat)→None#
Adds a value to the stat to be aggregated according to theconfigured stat type and aggregations.
- propertycount#
Number of data points that have currently been collected. Resetsonce the event has been logged.
- get(self:torch._C._monitor.Stat)→dict[torch._C._monitor.Aggregation,float]#
Returns the current value of the stat, primarily for testingpurposes. If the stat has logged and no additional values have beenadded this will be zero.
- propertyname#
The name of the stat that was set during creation.
- classtorch.monitor.data_value_t#
data_value_t is one of
str,float,int,bool.
- classtorch.monitor.Event#
Event represents a specific typed event to be logged. This can representhigh-level data points such as loss or accuracy per epoch or morelow-level aggregations such as through the Stats provided through thislibrary.
All Events of the same type should have the same name so downstreamhandlers can correctly process them.
- __init__(self:torch._C._monitor.Event,name:str,timestamp:datetime.datetime,data:collections.abc.Mapping[str,data_value_t])→None#
Constructs the
Event.
- propertydata#
The structured data contained within the
Event.
- propertyname#
The name of the
Event.
- propertytimestamp#
The timestamp when the
Eventhappened.
- classtorch.monitor.EventHandlerHandle#
EventHandlerHandle is a wrapper type returned by
register_event_handlerused to unregister the handler viaunregister_event_handler. This cannot be directly initialized.
- torch.monitor.log_event(event:torch._C._monitor.Event)→None#
log_event logs the specified event to all of the registered eventhandlers. It’s up to the event handlers to log the event out to thecorresponding event sink.
If there are no event handlers registered this method is a no-op.
- torch.monitor.register_event_handler(callback:collections.abc.Callable[[torch._C._monitor.Event],None])→torch._C._monitor.EventHandlerHandle#
register_event_handler registers a callback to be called whenever anevent is logged via
log_event. These handlers should avoid blockingthe main thread since that may interfere with training as they runduring thelog_eventcall.
- torch.monitor.unregister_event_handler(handler:torch._C._monitor.EventHandlerHandle)→None#
unregister_event_handler unregisters the
EventHandlerHandlereturnedafter callingregister_event_handler. After this returns the eventhandler will no longer receive events.
- classtorch.monitor.TensorboardEventHandler(writer)[source]#
TensorboardEventHandler is an event handler that will write known events tothe provided SummaryWriter.
This currently only supports
torch.monitor.Statevents which are loggedas scalars.Example
>>>fromtorch.utils.tensorboardimportSummaryWriter>>>fromtorch.monitorimportTensorboardEventHandler,register_event_handler>>>writer=SummaryWriter("log_dir")>>>register_event_handler(TensorboardEventHandler(writer))