| Part of a series on |
| Machine learning anddata mining |
|---|
Learning with humans |
Model diagnostics |
Structured prediction orstructured output learning is anumbrella term forsupervisedmachine learning techniques that involvespredicting structured objects, rather thandiscrete orreal values.[1]
Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the predicted value is compared to theground truth, and this is used to adjust the model parameters. Due to the complexity of the model and the interrelations of predicted variables, the processes of model training and inference are often computationally infeasible, soapproximate inference and learning methods are used.
An example application is the problem of translating anatural language sentence into a syntactic representation such as aparse tree. This can be seen as a structured prediction problem[2] in which the structured output domain is the set of all possible parse trees. Structured prediction is used in a wide variety of domains includingbioinformatics,natural language processing (NLP),speech recognition, andcomputer vision.
Sequence tagging is a class of problems prevalent in NLP in which input data are often sequential, for instance sentences of text. The sequence tagging problem appears in several guises, such aspart-of-speech tagging (POS tagging) andnamed entity recognition. In POS tagging, for example, each word in a sequence must be 'tagged' with aclass label representing the type of word:
The main challenge of this problem is to resolveambiguity: in the above example, the words "sentence" and "tagged" in English can also beverbs.
While this problem can be solved by simply performingclassification of individualtokens, this approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strongconditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as ahidden Markov model orconditional random field[2] that predicts the entire tag sequence for a sentence (rather than just individual tags) via theViterbi algorithm.
Probabilisticgraphical models form a large class of structured prediction models. In particular,Bayesian networks andrandom fields are popular. Other algorithms and models for structured prediction includeinductive logic programming,case-based reasoning,structured SVMs,Markov logic networks,Probabilistic Soft Logic, andconstrained conditional models. The main techniques are:
One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron byCollins.[3] This algorithm combines theperceptron algorithm for learninglinear classifiers with an inference algorithm (classically theViterbi algorithm when used on sequence data) and can be described abstractly as follows:
In practice, finding the argmax over is done using an algorithm such as Viterbi or amax-sum, rather than anexhaustive search through an exponentially large set of candidates.
The idea of learning is similar to that formulticlass perceptrons.