Neural networks: Multi-class classification Stay organized with collections Save and categorize content based on your preferences.
Page Summary
This document explores multi-class classification models, which predict from multiple possibilities instead of just two, like binary classification models.
Multi-class classification can be achieved through two main approaches: one-vs.-all and one-vs.-one (softmax).
One-vs.-all uses multiple binary classifiers, one for each possible outcome, to determine the probability of each class independently.
One-vs.-one (softmax) predicts probabilities of each class relative to all other classes, ensuring all probabilities sum to 1 using the softmax function.
Softmax is efficient for fewer classes but can become computationally expensive with many classes; candidate sampling offers an alternative for increased efficiency.
Earlier, you encounteredbinary classificationmodels that could pick between one oftwo possible choices, such as whether:
- A given email is spam or not spam.
- A given tumor is malignant or benign.
In this section, we'll investigatemulti-class classificationmodels, which can pick frommultiple possibilities. For example:
- Is this dog a beagle, a basset hound, or a bloodhound?
- Is this flower a Siberian Iris, Dutch Iris, Blue Flag Iris,or Dwarf Bearded Iris?
- Is that plane a Boeing 747, Airbus 320, Boeing 777, or Embraer 190?
- Is this an image of an apple, bear, candy, dog, or egg?
Some real-world multi-class problems entail choosing frommillionsof separate classes. For example, consider a multi-class classificationmodel that can identify the image of just about anything.
This section details the two main variants of multi-class classification:
- one-vs.-all
- one-vs.-one, which is usually known assoftmax
One versus all
One-vs.-all provides a way to use binary classificationfor a series of yes or no predictions across multiple possible labels.
Given a classification problem with N possible solutions, a one-vs.-allsolution consists of N separate binary classifiers—one binaryclassifier for each possible outcome. During training, the model runsthrough a sequence of binary classifiers, training each to answer a separateclassification question.
For example, given a picture of a piece of fruit, fourdifferent recognizers might be trained, each answering a different yes/no question:
- Is this image an apple?
- Is this image an orange?
- Is this image a banana?
- Is this image a grape?
The following image illustrates how this works in practice.

This approach is fairly reasonable when the total number of classesis small, but becomes increasingly inefficient as the number of classesrises.
We can create a significantly more efficient one-vs.-all modelwith a deep neural network in which each output node represents a differentclass. The following image illustrates this approach.

One versus one (softmax)
You may have noticed that the probability values in the output layer of Figure 8don't sum to 1.0 (or 100%). (In fact, they sum to 1.43.) In a one-vs.-allapproach, the probability of each binary set of outcomes is determinedindependently of all the other sets. That is, we're determining the probabilityof "apple" versus "not apple" without considering the likelihood of our otherfruit options: "orange", "pear", or "grape."
But what if we want to predict the probabilities of each fruitrelative to each other? In this case, instead of predicting "apple" versus "notapple", we want to predict "apple" versus "orange" versus "pear" versus "grape".This type of multi-class classification is calledone-vs.-one classification.
We can implement a one-vs.-one classification using the same type of neuralnetwork architecture used for one-vs.-all classification, with one key change.We need to apply a different transform to the output layer.
For one-vs.-all, we applied the sigmoid activation function to each outputnode independently, which resulted in an output value between 0 and 1 for eachnode, but did not guarantee that these values summed to exactly 1.
For one-vs.-one, we can instead apply a function calledsoftmax, whichassigns decimal probabilities to each class in a multi-class problem such thatall probabilities add up to 1.0. This additional constrainthelps training converge more quickly than it otherwise would.
Click the plus icon to see the softmax equation.
The softmax equation is as follows:
Note that this formula basically extends the formula for logisticregression into multiple classes.
The following image re-implements our one-vs.-all multi-class classificationtask as a one-vs.-one task. Note that in order to perform softmax, the hiddenlayer directly preceding the output layer (called the softmax layer) must havethe same number of nodes as the output layer.

Softmax options
Consider the following variants of softmax:
Full softmax is the softmax we've been discussing; that is,softmax calculates a probability for every possible class.
Candidate sampling means that softmax calculates a probabilityfor all the positive labels but only for a random sample ofnegative labels. For example, if we are interested in determiningwhether an input image is a beagle or a bloodhound, we don't have toprovide probabilities for every non-doggy example.
Full softmax is fairly cheap when the number of classes is smallbut becomes prohibitively expensive when the number of classes climbs.Candidate sampling can improve efficiency in problems having a largenumber of classes.
One label versus many labels
Softmax assumes that each example is a member of exactly one class.Some examples, however, can simultaneously be a member of multiple classes.For such examples:
- You may not use softmax.
- You must rely on multiple logistic regressions.
For example, the one-vs.-one model in Figure 9 above assumes that each inputimage will depict exactly one type of fruit: an apple, an orange, a pear, ora grape. However, if an input image might contain multiple types of fruit—a bowl of both apples and oranges—you'll have to use multiple logisticregressions instead.
Key terms:Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.