Rate this Page

Events#

Created On: May 04, 2021 | Last Updated On: Jun 10, 2024

Module contains events processing mechanisms that are integrated with the standard python logging.

Example of usage:

fromtorch.distributed.elasticimporteventsevent=events.Event(name="test_event",source=events.EventSource.WORKER,metadata={...})events.get_logging_handler(destination="console").info(event)

API Methods#

torch.distributed.elastic.events.record(event,destination='null')[source]#
torch.distributed.elastic.events.construct_and_record_rdzv_event(run_id,message,node_state,name='',hostname='',pid=None,master_endpoint='',local_id=None,rank=None)[source]#

Initialize rendezvous event object and record its operations.

Parameters:
  • run_id (str) – The run id of the rendezvous.

  • message (str) – The message describing the event.

  • node_state (NodeState) – The state of the node (INIT, RUNNING, SUCCEEDED, FAILED).

  • name (str) – Event name. (E.g. Current action being performed).

  • hostname (str) – Hostname of the node.

  • pid (Optional[int]) – The process id of the node.

  • master_endpoint (str) – The master endpoint for the rendezvous store, if known.

  • local_id (Optional[int]) – The local_id of the node, if defined in dynamic_rendezvous.py

  • rank (Optional[int]) – The rank of the node, if known.

Returns:

None

Return type:

None

Example

>>># See DynamicRendezvousHandler class>>>def_record(...self,...message:str,...node_state:NodeState=NodeState.RUNNING,...rank:Optional[int]=None,...)->None:...construct_and_record_rdzv_event(...name=f"{self.__class__.__name__}.{get_method_name()}",...run_id=self._settings.run_id,...message=message,...node_state=node_state,...hostname=self._this_node.addr,...pid=self._this_node.pid,...local_id=self._this_node.local_id,...rank=rank,...)
torch.distributed.elastic.events.get_logging_handler(destination='null')[source]#
Return type:

Handler

Event Objects#

classtorch.distributed.elastic.events.api.Event(name,source,timestamp=0,metadata=<factory>)[source]#

The class represents the generic event that occurs during the torchelastic job execution.

The event can be any kind of meaningful action.

Parameters:
  • name (str) – event name.

  • source (EventSource) – the event producer, e.g. agent or worker

  • timestamp (int) – timestamp in milliseconds when event occurred.

  • metadata (dict[str,str |int |float |bool |None]) – additional data that is associated with the event.

classtorch.distributed.elastic.events.api.EventSource(value)[source]#

Known identifiers of the event producers.

torch.distributed.elastic.events.api.EventMetadataValue#

alias ofOptional[Union[str,int,float,bool]]