Callbacks#

Exponential Moving Average (EMA)#

During training, EMA maintains a moving average of the trained parameters.EMA parameters can produce significantly better results and faster convergence for a variety of different domains and models.

EMA is a simple calculation. EMA Weights are pre-initialized with the model weights at the start of training.

Every training update, the EMA weights are updated based on the new model weights.

\[ema_w = ema_w * decay + model_w * (1-decay)\]

Enabling EMA is straightforward. We can pass the additional argument to the experiment manager at runtime.

pythonexamples/asr/asr_ctc/speech_to_text_ctc.py\model.train_ds.manifest_filepath=/path/to/my/train/manifest.json\model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json\trainer.devices=2\trainer.accelerator='gpu'\trainer.max_epochs=50\exp_manager.ema.enable=True# pass this additional argument to enable EMA

To change the decay rate, pass the additional argument.

pythonexamples/asr/asr_ctc/speech_to_text_ctc.py\...exp_manager.ema.enable=True\exp_manager.ema.decay=0.999

We also offer other helpful arguments.

Argument

Description

exp_manager.ema.validate_original_weights=True

Validate the original weights instead of EMA weights.

exp_manager.ema.every_n_steps=2

Apply EMA every N steps instead of every step.

exp_manager.ema.cpu_offload=True

Offload EMA weights to CPU. May introduce significant slow-downs.