Statistical data collection is concerned with the planning of studies, especially with thedesign of randomized experiments and with the planning ofsurveys usingrandom sampling. The initial analysis of the data often follows the study protocol specified prior to the study being conducted. The data from a study can also be analyzed to consider secondary hypotheses inspired by the initial results, or to suggest new studies. A secondary analysis of the data from a planned study uses tools fromdata analysis, and the process of doing this is mathematical statistics.
Data analysis is divided into:
descriptive statistics – the part of statistics that describes data, i.e. summarises the data and their typical properties.
inferential statistics – the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. usingconfidence intervals).
While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data. For example, fromnatural experiments andobservational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective.[4][5]
Bernoulli distribution, for the outcome of a single Bernoulli trial (e.g. success/failure, yes/no)
Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed total number ofindependent occurrences
Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs
Geometric distribution, for binomial-type observations but where the quantity of interest is the number of failures before the first success; a special case of the negative binomial distribution, where the number of successes is one.
Statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[8] Initial requirements of such a system of procedures forinference andinduction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. Inferential statistics are used to test hypotheses and make estimations using sample data. Whereasdescriptive statistics describe a sample, inferential statistics infer predictions about a larger population that the sample represents.
The outcome of statistical inference may be an answer to the question "what should be done next?", where this might be a decision about making further experiments or surveys, or about drawing a conclusion before implementing some organizational or governmental policy.For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses:
astatistical model of the random process that is supposed to generate the data, which is known when randomization has been used, and
a particular realization of the random process; i.e., a set of data.
Instatistics,regression analysis is a statistical process for estimating the relationships among variables. It includes many ways for modeling and analyzing several variables, when the focus is on the relationship between adependent variable and one or moreindependent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates theconditional expectation of the dependent variable given the independent variables – that is, theaverage value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on aquantile, or otherlocation parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is afunction of the independent variables called theregression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by aprobability distribution.
Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have aranking but no clear numerical interpretation, such as when assessingpreferences. In terms oflevels of measurement, non-parametric methods result in "ordinal" data.
As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are morerobust.
One drawback of non-parametric methods is that since they do not rely on assumptions, they are generally lesspowerful than their parametric counterparts.[10] Low power non-parametric tests are problematic because a common use of these methods is for when a sample has a low sample size.[10] Many parametric methods are proven to be the most powerful tests through methods such as theNeyman–Pearson lemma and theLikelihood-ratio test.
Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.
Statistics, mathematics, and mathematical statistics
Mathematical statistics is a key subset of the discipline ofstatistics.Statistical theorists study and improve statistical procedures with mathematics, and statistical research often raises mathematical questions.