This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Classification" – news ·newspapers ·books ·scholar ·JSTOR(July 2025) (Learn how and when to remove this message) |
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example throughcluster analysis).[1] Examples include diagnostic tests, identifying spam emails and deciding whether to give someone a driving license.
As well as 'category', synonyms or near-synonyms for 'class' include 'type', 'species', 'forms', 'order', 'concept', 'taxon', 'group', 'identification' and 'division'.
The meaning of the word 'classification' (and its synonyms) may take on one of several related meanings. It may encompass both classification and the creation of classes, as for example in 'the task of categorizing pages in Wikipedia'; this overall activity is listed undertaxonomy. It may refer exclusively to the underlying scheme of classes (which otherwise may be called a taxonomy). Or it may refer to the label given to an object by the classifier.
Classification is a part of many different kinds of activities and is studied from many different points of view includingmedicine,philosophy,[2]law,anthropology,biology,taxonomy,cognition,communications,knowledge organization,psychology,statistics,machine learning,economics andmathematics.
Methodological work aimed at improving the accuracy of a classifier is commonly divided between cases where there are exactly two classes (binary classification) and cases where there are three or more classes (multiclass classification).
Unlike indecision theory, it is assumed that a classifier repeats the classification task over and over. And unlike alottery, it is assumed that each classification can be either right or wrong; in the theory of measurement, classification is understood as measurement against anominal scale. Thus it is possible to try to measure the accuracy of a classifier.
Measuring the accuracy of a classifier allows a choice to be made between two alternative classifiers. This is important both when developing a classifier and in choosing which classifier to deploy. There are however many different methods for evaluating the accuracy of a classifier and no general method for determining which method should be used in which circumstances. Different fields have taken different approaches, even in binary classification (seeEvaluation of binary classifiers). Inpattern recognition, error rate is popular. TheGini coefficient and KS statistic are widely used in the credit scoring industry.Sensitivity and specificity are widely used in epidemiology and medicine.Precision and recall are widely used in information retrieval.[3]
Classifier accuracy depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems (a phenomenon that may be explained by theno-free-lunch theorem).