Movatterモバイル変換

Structured prediction

From Wikipedia, the free encyclopedia

Supervised machine learning techniques

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences AAAI ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Structured prediction orstructured output learning is anumbrella term forsupervised machine learning techniques that involvespredicting structured objects, rather thandiscrete orreal values.^[1]

Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the predicted value is compared to theground truth, and this is used to adjust the model parameters. Due to the complexity of the model and the interrelations of predicted variables, the processes of model training and inference are often computationally infeasible, soapproximate inference and learning methods are used.

Applications

[edit]

An example application is the problem of translating anatural language sentence into a syntactic representation such as aparse tree. This can be seen as a structured prediction problem^[2] in which the structured output domain is the set of all possible parse trees. Structured prediction is used in a wide variety of domains includingbioinformatics,natural language processing (NLP),speech recognition, andcomputer vision.

Example: sequence tagging

[edit]

Sequence tagging is a class of problems prevalent in NLP in which input data are often sequential, for instance sentences of text. The sequence tagging problem appears in several guises, such aspart-of-speech tagging (POS tagging) andnamed entity recognition. In POS tagging, for example, each word in a sequence must be 'tagged' with aclass label representing the type of word:

This	DT
is	VBZ
a	DT
tagged	JJ
sentence.	NN

The main challenge of this problem is to resolveambiguity: in the above example, the words "sentence" and "tagged" in English can also beverbs.

While this problem can be solved by simply performingclassification of individualtokens, this approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strongconditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as ahidden Markov model orconditional random field^[2] that predicts the entire tag sequence for a sentence (rather than just individual tags) via theViterbi algorithm.

Techniques

[edit]

Probabilisticgraphical models form a large class of structured prediction models. In particular,Bayesian networks andrandom fields are popular. Other algorithms and models for structured prediction includeinductive logic programming,case-based reasoning,structured SVMs,Markov logic networks,Probabilistic Soft Logic, andconstrained conditional models. The main techniques are:

Structured perceptron

[edit]

One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron byCollins.^[3] This algorithm combines theperceptron algorithm for learninglinear classifiers with an inference algorithm (classically theViterbi algorithm when used on sequence data) and can be described abstractly as follows:

First, define a function $\phi (x,y)$ that maps a training sample $x {\displaystyle x}$ and a candidate prediction $y {\displaystyle y}$ to a vector of length $n {\displaystyle n}$ ( $x {\displaystyle x}$ and $y {\displaystyle y}$ may have any structure; $n {\displaystyle n}$ is problem-dependent, but must be fixed for each model). Let $G E N {\displaystyle GEN}$ be a function that generates candidate predictions.
Then:

Let

w {\displaystyle w}

be a weight vector of length

n {\displaystyle n}

For a predetermined number of iterations:

For each sample

x {\displaystyle x}

in the training set with true output

t {\displaystyle t}

Make a prediction

{\hat {y}}

{\hat {y}}={\operatorname {arg\,max} }\,\{y\in GEN(x)\}\,(w^{T},\phi (x,y))

Update

w {\displaystyle w}

(from

{\hat {y}}

towards

t {\displaystyle t}

w=w+c(-\phi (x,{\hat {y}})+\phi (x,t))

, where

c {\displaystyle c}

is thelearning rate.

In practice, finding the argmax over ${GEN}({x})$ is done using an algorithm such as Viterbi or amax-sum, rather than anexhaustive search through an exponentially large set of candidates.