Movatterモバイル変換

Jump to content

Discriminative model

From Wikipedia, the free encyclopedia

(Redirected fromConditional model)

Mathematical model used for classification or regression

Discriminative models, also referred to asconditional models, are a class of models frequently used forclassification. Inmachine learning, it typically models the conditional distribution P(Y∣X), or it learns a direct decision rule that maps inputs X to outputs Y. Discriminative models are commonly used forclassification andregression, where the main goal is accurate prediction on new data. They are typically used to solvebinary classification problems, i.e. assign labels, such as pass/fail, win/lose, alive/dead or healthy/sick, to existing datapoints. Discriminative models are usually trained to separate classes or to minimizeprediction error under a chosenloss function. They are often contrasted with generative models, which aim to model how the data are generated and can be used to sample new data.

Types of discriminative models includelogistic regression (LR),conditional random fields (CRFs),decision trees among many others.

Definition

Unlike generative modelling, which studies thejoint probability $P(x,y)$ , discriminative modeling studies the $P(y|x)$ or maps the given unobserved variable (target) $x {\displaystyle x}$ to a class label $y {\displaystyle y}$ dependent on the observed variables (training samples). For example, inobject recognition, $x {\displaystyle x}$ is likely to be a vector of raw pixels (or features extracted from the raw pixels of the image). Within a probabilistic framework, this is done by modeling theconditional probability distribution $P(y|x)$ , which can be used for predicting $y {\displaystyle y}$ from $x {\displaystyle x}$ . Note that there is still distinction between the conditional model and the discriminative model, though more often they are simply categorised as discriminative model.

Pure discriminative model vs. conditional model

Aconditional model models the conditionalprobability distribution, while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples.^[1]

Contrast with generative model

Instatistical classification, two main approaches are called thegenerative approach and thediscriminative approach. These computeclassifiers by different approaches, differing in the degree ofstatistical modelling. Terminology is inconsistent,^[a] but three major types can be distinguished:

A generative model is astatistical model of thejoint probability distribution $P(X,Y)$ on a givenobservable variableX andtarget variableY; A generative model can be used to "generate" random instances (outcomes) of an observationx.
Adiscriminative model is a model of theconditional probability $P(Y\mid X=x)$ of the targetY, given an observationx. It can be used to "discriminate" the value of the target variableY, given an observationx.
Classifiers computed without using a probability model are also referred to loosely as "discriminative".

The distinction between these last two classes is not consistently made.

An alternative division defines these symmetrically as:

agenerative model is a model of the conditional probability of the observableX, given a targety, symbolically, $P(X\mid Y=y)$
adiscriminative model is a model of the conditional probability of the targetY, given an observationx, symbolically, $P(Y\mid X=x)$

Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (outcomes), either of an observation and target $(x,y)$ , or of an observationx given a target valuey, while a discriminative model or discriminative classifier (without a model) can be used to "discriminate" the value of the target variableY, given an observationx.

Contrast in approaches

Let's say we are given the $m {\displaystyle m}$ class labels (classification) and $n {\displaystyle n}$ feature variables, $Y:\{y_{1},y_{2},\ldots ,y_{m}\},X:\{x_{1},x_{2},\ldots ,x_{n}\}$ , as the training samples.

A generative model takes the joint probability $P(x,y)$ , where $x {\displaystyle x}$ is the input and $y {\displaystyle y}$ is the label, and predicts the most possible known label ${\widetilde {y}}\in Y$ for the unknown variable ${\widetilde {x}}$ usingBayes' theorem.

Discriminative models, as opposed togenerative models, do not allow one to generate samples from thejoint distribution of observed and target variables. However, for tasks such asclassification andregression that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute). On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherentlysupervised and cannot easily supportunsupervised learning. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model.

Discriminative models and generative models also differ in introducing theposterior possibility. To maintain the least expected loss, the minimization of result's misclassification should be acquired. In the discriminative model, the posterior probabilities, $P(y|x)$ , is inferred from a parametric model, where the parameters come from the training data. Points of estimation of the parameters are obtained from the maximization of likelihood or distribution computation over the parameters. On the other hand, considering that the generative models focus on the joint probability, the class posterior possibility $P(k)$ is considered inBayes' theorem, which is

P(y|x)={\frac {p(x|y)p(y)}{\textstyle \sum _{i}p(x|i)p(i)\displaystyle }}={\frac {p(x|y)p(y)}{p(x)}}

.

Advantages and disadvantages in application

In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster. However, in Ulusoy and Bishop's joint work,Comparison of Generative and Discriminative Techniques for Object Detection and Classification, they state that the above statement is true only when the model is the appropriate one for data (i.e.the data distribution is correctly modeled by the generative model).

Advantages

Significant advantages of using discriminative modeling are:

Higher accuracy, which mostly leads to better learning result.
Allows simplification of the input and provides a direct approach to $P(y|x)$
Saves calculation resource
Generates lower asymptotic errors

Compared with the advantages of using generative modeling:

Takes all data into consideration, which could result in slower processing as a disadvantage
Requires fewer training samples
A flexible framework that could easily cooperate with other needs of the application

Disadvantages

Training method usually requires multiple numerical optimization techniques
Similarly by the definition, the discriminative model will need the combination of multiple subtasks for solving a complex real-world problem

Typical discriminative modelling approaches

The following approach is based on the assumption that it is given the training data-set $D=\{(x_{i};y_{i})|i\leq N\in \mathbb {Z} \}$ , where $y_{i}$ is the corresponding output for the input $x_{i}$ .^[2]

Linear classifier

We intend to use the function $f(x)$ to simulate the behavior of what we observed from the training data-set by thelinear classifier method. Using the joint feature vector $\phi (x,y)$ , the decision function is defined as:

f(x;w)=\arg \max _{y}w^{T}\phi (x,y)

According to Memisevic's interpretation,^[2] $w^{T}\phi (x,y)$ , which is also $c(x,y;w)$ , computes a score which measures the compatibility of the input $x {\displaystyle x}$ with the potential output $y {\displaystyle y}$ . Then the $\arg \max$ determines the class with the highest score.

Logistic regression (LR)

Since the0-1 loss function is a commonly used one in the decision theory, the conditionalprobability distribution $P(y|x;w)$ , where $w {\displaystyle w}$ is a parameter vector for optimizing the training data, could be reconsidered as following for the logistics regression model:

P(y|x;w)={\frac {1}{Z(x;w)}}\exp(w^{T}\phi (x,y))

, with

Z(x;w)=\textstyle \sum _{y}\displaystyle \exp(w^{T}\phi (x,y))

The equation above representslogistic regression. Notice that a major distinction between models is their way of introducing posterior probability. Posterior probability is inferred from the parametric model. We then can maximize the parameter by following equation:

L(w)=\textstyle \sum _{i}\displaystyle \log p(y^{i}|x^{i};w)

It could also be replaced by thelog-loss equation below:

l^{\log }(x^{i},y^{i},c(x^{i};w))=-\log p(y^{i}|x^{i};w)=\log Z(x^{i};w)-w^{T}\phi (x^{i},y^{i})

Since thelog-loss is differentiable, a gradient-based method can be used to optimize the model. A global optimum is guaranteed because the objective function is convex. The gradient of log likelihood is represented by:

{\frac {\partial L(w)}{\partial w}}=\textstyle \sum _{i}\displaystyle \phi (x^{i},y^{i})-E_{p(y|x^{i};w)}\phi (x^{i},y)

where $E_{p(y|x^{i};w)}$ is the expectation of $p(y|x^{i};w)$ .

The above method will provide efficient computation for the relative small number of classification.

Training objectives and Optimizations in applications

Since both advantages and disadvantages present on the two way of modeling, combining both approaches will be a good modeling in practice. For example, in Marras' articleA Joint Discriminative Generative Model for Deformable Model Construction and Classification,^[3] he and his coauthors apply the combination of two modelings on face classification of the models, and receive a higher accuracy than the traditional approach.

Similarly, Kelm^[4] also proposed the combination of two modelings for pixel classification in his articleCombining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning.

During the process of extracting the discriminative features prior to the clustering,Principal component analysis (PCA), though commonly used, is not a necessarily discriminative approach. In contrast, LDA is a discriminative one.^[5]Linear discriminant analysis (LDA), provides an efficient way of eliminating the disadvantage we list above. As we know, the discriminative model needs a combination of multiple subtasks before classification, and LDA provides appropriate solution towards this problem by reducing dimension.

Empirical risk minimization
Common loss functions (log loss, hinge loss, squared loss)
Regularization (L1/L2)
Optimization methods (gradient descent family)

Families and types

Examples of discriminative models include:

Logistic regression, a type ofgeneralized linear regression used for predictingbinary orcategorical outputs (also known asmaximum entropy classifiers)
Boosting (meta-algorithm)
Conditional random fields
Linear regression
Computer vision
Random forests
k-nearest neighbors algorithm
Support Vector Machines
Decision Tree Learning
Maximum-entropy Markov models

See also

Generative model

Notes

^Three leading sources,Ng & Jordan 2002,Jebara 2004, andMitchell 2015, give different divisions and definitions.

References

^Ballesteros, Miguel."Discriminative Models"(PDF). RetrievedOctober 28, 2018.^{[permanent dead link]}
^^a ^bMemisevic, Roland (December 21, 2006)."An introduction to structured discriminative learning". RetrievedOctober 29, 2018.
^Marras, Ioannis (2017)."A Joint Discriminative Generative Model for Deformable Model Construction and Classification"(PDF). Retrieved5 November 2018.
^Kelm, B. Michael."Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning"(PDF). Archived fromthe original(PDF) on 17 July 2019. Retrieved5 November 2018.
^Wang, Zhangyang (2015)."A Joint Optimization Framework of Sparse Coding and Discriminative Clustering"(PDF). Retrieved5 November 2018.

Sources

Jebara, Tony (2004).Machine Learning: Discriminative and Generative. The Springer International Series in Engineering and Computer Science. Kluwer Academic (Springer).ISBN 978-1-4020-7647-3.
Mitchell, Tom M. (2015)."3. Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression"(PDF).Machine Learning.
Ng, Andrew Y.;Jordan, Michael I. (2002)."On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes"(PDF).Advances in Neural Information Processing Systems.

v
t
e

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Index of dispersion

Summary tables

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS) Template:Least squares and regression analysis
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / multivariate / time-series / survival analysis

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR))
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Retrieved from "https://en.wikipedia.org/w/index.php?title=Discriminative_model&oldid=1337848320"

Regression models

Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp