Loss Functions

Page Summary

Generative Adversarial Networks (GANs) aim to replicate probability distributions, employing loss functions to measure the distance between real and generated data distributions.
TF-GAN offers various loss functions, including minimax loss (from the original GAN paper) and Wasserstein loss (the default for TF-GAN Estimators).
GANs can utilize two loss functions, one for the generator and one for the discriminator, derived from a single distance measure, with the generator focusing on minimizing its generated data's dissimilarity to real data.
Wasserstein GANs (WGANs), using Wasserstein loss, employ a "critic" instead of a discriminator, aiming to maximize the difference in its output for real and fake instances.
WGANs offer advantages such as reduced vulnerability to training stagnation and the use of the earth mover distance, a true metric for measuring the distance between probability distributions.

GANs try to replicate a probability distribution. They shouldtherefore use loss functions that reflect the distance between thedistribution of the data generated by the GAN and the distribution of the realdata.

How do you capture the difference between two distributions in GAN lossfunctions? This question is an area of active research, and many approaches havebeen proposed. We'll address two common GAN loss functions here, both of whichare implemented in TF-GAN:

minimax loss: The loss function used in thepaper that introducedGANs.
Wasserstein loss: The default loss function for TF-GAN Estimators. Firstdescribed in a2017 paper.

TF-GAN implements many other loss functions as well.

One Loss Function or Two?

A GAN can have two loss functions: one for generator training and one fordiscriminator training. How can two loss functions work together to reflect adistance measure between probability distributions?

In the loss schemes we'll look at here, the generator and discriminator lossesderive from a single measure of distance between probability distributions. Inboth of these schemes, however, the generator can only affect one term in thedistance measure: the term that reflects the distribution of the fake data. Soduring generator training we drop the other term, which reflects thedistribution of the real data.

The generator and discriminator losses look different in the end, even thoughthey derive from a single formula.

Minimax Loss

In the paper that introduced GANs, the generator tries to minimize the followingfunction while the discriminator tries to maximize it:

$$E_x[log(D(x))] + E_z[log(1 - D(G(z)))]$$

In this function:

D(x) is the discriminator's estimate of the probability that realdata instance x is real.
E_x is the expected value over all real data instances.
G(z) is the generator's output when given noise z.
D(G(z)) is the discriminator's estimate of the probability that a fakeinstance is real.
E_z is the expected value over all random inputs to the generator(in effect, the expected value over all generated fake instances G(z)).
The formula derives from thecross-entropy between the realand generated distributions.

The generator can't directly affect thelog(D(x)) term in the function, so,for the generator, minimizing the loss is equivalent to minimizinglog(1 -D(G(z))).

In TF-GAN, seeminimax_discriminator_loss andminimax_generator_lossfor an implementation of this loss function.

Modified Minimax Loss

The original GAN paper notes that the above minimax loss function can cause theGAN to get stuck in the early stages of GAN training when the discriminator'sjob is very easy. The paper therefore suggests modifying the generator loss sothat the generator tries to maximizelog D(G(z)).

In TF-GAN, seemodified_generator_lossfor an implementation of this modification.

Wasserstein Loss

By default, TF-GAN usesWasserstein loss.

This loss function depends on a modification of the GAN scheme (called"Wasserstein GAN" or "WGAN") in which the discriminator does not actuallyclassify instances. For each instance it outputs a number. This number does nothave to be less than one or greater than 0, so we can't use 0.5 as a thresholdto decide whether an instance is real or fake. Discriminator training just triesto make the output bigger for real instances than for fake instances.

Because it can't really discriminate between real and fake, the WGANdiscriminator is actually called a "critic" instead of a "discriminator". Thisdistinction has theoretical importance, but for practical purposes we can treatit as an acknowledgement that the inputs to the loss functions don't have to beprobabilities.

The loss functions themselves are deceptively simple:

Critic Loss:D(x) - D(G(z))

The discriminator tries to maximize this function. In other words, it tries tomaximize the difference between its output on real instances and its output onfake instances.

Generator Loss:D(G(z))

The generator tries to maximize this function. In other words, It tries tomaximize the discriminator's output for its fake instances.

In these functions:

D(x) is the critic's output for a real instance.
G(z) is the generator's output when given noise z.
D(G(z)) is the critic's output for a fake instance.
The output of critic D doesnot have to be between 1 and 0.
The formulas derive from theearth mover distancebetween the real and generated distributions.

In TF-GAN, seewasserstein_generator_loss andwasserstein_discriminator_loss for implementations.

Requirements

The theoretical justification for the Wasserstein GAN (or WGAN) requires thatthe weights throughout the GAN be clipped so that they remain within aconstrained range.

Benefits

Wasserstein GANs are less vulnerable to getting stuck than minimax-based GANs,and avoid problems with vanishing gradients. The earth mover distance also hasthe advantage of being a true metric: a measure of distance in a space ofprobability distributions. Cross-entropy is not a metric in this sense.

GAN Training

Check Your Understanding

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-25 UTC.

Movatterモバイル変換

Loss Functions Stay organized with collections Save and categorize content based on your preferences.