Adescriptive statistic (in thecount noun sense) is asummary statistic that quantitatively describes or summarizes features from a collection ofinformation,[1] whiledescriptive statistics (in themass noun sense) is the process of using and analysing those statistics. Descriptive statistics is distinguished frominferential statistics (or inductive statistics) by its aim to summarize asample, rather than use the data to learn about thepopulation that the sample of data is thought to represent.[2] This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis ofprobability theory, and are frequentlynonparametric statistics.[3] Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented.[4] For example, in papers reporting on human subjects, typically a table is included giving the overallsample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), anddemographic or clinical characteristics such as theaverage age, the proportion of subjects of each sex, the proportion of subjects with relatedco-morbidities, etc.
Some measures that are commonly used to describe a data set are measures ofcentral tendency and measures of variability ordispersion. Measures of central tendency include themean,median andmode, while measures of variability include thestandard deviation (orvariance), the minimum and maximum values of the variables,kurtosis andskewness.[5]
Descriptive statistics provide simple summaries about the sample and about the observations that have been made. Such summaries may be eitherquantitative, i.e.summary statistics, or visual, i.e. simple-to-understand graphs. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.
For example, the shootingpercentage inbasketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. For example, a player who shoots 33% is making approximately one shot in every three. The percentage summarizes or describes multiple discrete events. Consider also thegrade point average. This single number describes the general performance of a student across the range of their course experiences.[6]
The use of descriptive and summary statistics has an extensive history and, indeed, the simple tabulation of populations and of economic data was the first way the topic ofstatistics appeared. More recently, a collection of summarisation techniques has been formulated under the heading ofexploratory data analysis: an example of such a technique is thebox plot.
In the business world, descriptive statistics provides a useful summary of many types of data. For example, investors and brokers may use a historical account of return behaviour by performing empirical and analytical analyses on their investments in order to make better investing decisions in the future.
When a sample consists of more than one variable, descriptive statistics may be used to describe the relationship between pairs of variables. In this case, descriptive statistics include:
The main reason for differentiating univariate and bivariate analysis is that bivariate analysis is not only a simple descriptive analysis, but also it describes the relationship between two different variables.[7] Quantitative measures of dependence include correlation (such asPearson's r when both variables are continuous, orSpearman's rho if one or both are not) andcovariance (which reflects the scale variables are measured on). The slope, in regression analysis, also reflects the relationship between variables. The unstandardised slope indicates the unit change in the criterion variable for a one unit change in thepredictor. The standardised slope indicates this change in standardised (z-score) units. Highly skewed data are often transformed by taking logarithms. The use of logarithms makes graphs more symmetrical and look more similar to thenormal distribution, making them easier to interpret intuitively.[8]: 47