Movatterモバイル変換

[0]ホーム

Jump to content

Mathematical statistics

Edit links

From Wikipedia, the free encyclopedia

Branch of statistics

Not to be confused withMathematics and statistics,Mathematics, orStatistics.

Illustration of linear regression on a data set.Regression analysis is an important part of mathematical statistics.

Statistics

Outline Statisticians Glossary Notation Journals Lists of topics Articles Category Mathematics portal
v t e

Part of a series on

Mathematics

Areas

Relationship with sciences

Mathematics Portal

Mathematical statistics is the application ofprobability theory and other mathematical concepts tostatistics, as opposed to techniques for collecting statistical data.^[1] Specific mathematical techniques that are commonly used in statistics includemathematical analysis,linear algebra,stochastic analysis,differential equations, andmeasure theory.^[2]^[3]

Introduction

[edit]

Statistical data collection is concerned with the planning of studies, especially with thedesign of randomized experiments and with the planning ofsurveys usingrandom sampling. The initial analysis of the data often follows the study protocol specified prior to the study being conducted. The data from a study can also be analyzed to consider secondary hypotheses inspired by the initial results, or to suggest new studies. A secondary analysis of the data from a planned study uses tools fromdata analysis, and the process of doing this is mathematical statistics.

Data analysis is divided into:

descriptive statistics – the part of statistics that describes data, i.e. summarises the data and their typical properties.
inferential statistics – the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. usingconfidence intervals).

While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data. For example, fromnatural experiments andobservational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective.^[4]^[5]

Topics

[edit]

The following are some of the important topics in mathematical statistics:^[6]^[7]

Probability distributions

[edit]

Main article:Probability distribution

Aprobability distribution is afunction that assigns aprobability to eachmeasurable subset of the possible outcomes of a randomexperiment,survey, or procedure ofstatistical inference. Examples are found in experiments whosesample space is non-numerical, where the distribution would be acategorical distribution; experiments whose sample space is encoded by discreterandom variables, where the distribution can be specified by aprobability mass function; and experiments with sample spaces encoded by continuous random variables, where the distribution can be specified by aprobability density function. More complex experiments, such as those involvingstochastic processes defined incontinuous time, may demand the use of more generalprobability measures.

A probability distribution can either beunivariate ormultivariate. A univariate distribution gives the probabilities of a singlerandom variable taking on various alternative values; a multivariate distribution (ajoint probability distribution) gives the probabilities of arandom vector—a set of two or more random variables—taking on various combinations of values. Important and commonly encountered univariate probability distributions include thebinomial distribution, thehypergeometric distribution, and thenormal distribution. Themultivariate normal distribution is a commonly encountered multivariate distribution.

Special distributions

[edit]

Normal distribution, the most common continuous distribution
Bernoulli distribution, for the outcome of a single Bernoulli trial (e.g. success/failure, yes/no)
Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed total number ofindependent occurrences
Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs
Geometric distribution, for binomial-type observations but where the quantity of interest is the number of failures before the first success; a special case of the negative binomial distribution, where the number of successes is one.
Discrete uniform distribution, for a finite set of values (e.g. the outcome of a fair die)
Continuous uniform distribution, for continuously distributed values
Poisson distribution, for the number of occurrences of a Poisson-type event in a given period of time
Exponential distribution, for the time before the next Poisson-type event occurs
Gamma distribution, for the time before the next k Poisson-type events occur
Chi-squared distribution, the distribution of a sum of squaredstandard normal variables; useful e.g. for inference regarding thesample variance of normally distributed samples (seechi-squared test)
Student's t distribution, the distribution of the ratio of astandard normal variable and the square root of a scaledchi squared variable; useful for inference regarding themean of normally distributed samples with unknown variance (seeStudent's t-test)
Beta distribution, for a single probability (real number between 0 and 1); conjugate to theBernoulli distribution andbinomial distribution

Statistical inference

[edit]

Main article:Statistical inference

Statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.^[8] Initial requirements of such a system of procedures forinference andinduction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. Inferential statistics are used to test hypotheses and make estimations using sample data. Whereasdescriptive statistics describe a sample, inferential statistics infer predictions about a larger population that the sample represents.

The outcome of statistical inference may be an answer to the question "what should be done next?", where this might be a decision about making further experiments or surveys, or about drawing a conclusion before implementing some organizational or governmental policy.For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses:

astatistical model of the random process that is supposed to generate the data, which is known when randomization has been used, and
a particular realization of the random process; i.e., a set of data.

Regression

[edit]

Main article:Regression analysis

Instatistics,regression analysis is a statistical process for estimating the relationships among variables. It includes many ways for modeling and analyzing several variables, when the focus is on the relationship between adependent variable and one or moreindependent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates theconditional expectation of the dependent variable given the independent variables – that is, theaverage value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on aquantile, or otherlocation parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is afunction of the independent variables called theregression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by aprobability distribution.

Many techniques for carrying out regression analysis have been developed. Familiar methods, such aslinear regression, areparametric, in that the regression function is defined in terms of a finite number of unknownparameters that are estimated from thedata (e.g. usingordinary least squares).Nonparametric regression refers to techniques that allow the regression function to lie in a specified set offunctions, which may beinfinite-dimensional.

Nonparametric statistics

[edit]

Main article:Nonparametric statistics

Nonparametric statistics are values calculated from data in a way that is not based onparameterized families ofprobability distributions. They include bothdescriptive andinferential statistics. The typical parameters are the expectations, variance, etc. Unlikeparametric statistics, nonparametric statistics make no assumptions about theprobability distributions of the variables being assessed.^[9]

Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have aranking but no clear numerical interpretation, such as when assessingpreferences. In terms oflevels of measurement, non-parametric methods result in "ordinal" data.

As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are morerobust.

One drawback of non-parametric methods is that since they do not rely on assumptions, they are generally lesspowerful than their parametric counterparts.^[10] Low power non-parametric tests are problematic because a common use of these methods is for when a sample has a low sample size.^[10] Many parametric methods are proven to be the most powerful tests through methods such as theNeyman–Pearson lemma and theLikelihood-ratio test.

Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.

Statistics, mathematics, and mathematical statistics

[edit]

Mathematical statistics is a key subset of the discipline ofstatistics.Statistical theorists study and improve statistical procedures with mathematics, and statistical research often raises mathematical questions.

Mathematicians and statisticians likeGauss,Laplace, andC. S. Peirce useddecision theory withprobability distributions andloss functions (orutility functions). The decision-theoretic approach to statistical inference was reinvigorated byAbraham Wald and his successors^[11]^[12]^[13]^[14]^[15]^[16]^[17] and makes extensive use ofscientific computing,analysis, andoptimization; for thedesign of experiments, statisticians usealgebra andcombinatorics. But while statistical practice often relies onprobability anddecision theory, their application can be controversial^[5]

References

[edit]

^Shao, Jun (2008-02-03).Mathematical Statistics. Springer Science & Business Media.ISBN 978-0-387-21718-5.
^Kannan, D.; Lakshmikantham, V., eds. (2002).Handbook of stochastic analysis and applications. New York: M. Dekker.ISBN 0824706609.
^Schervish, Mark J. (1995).Theory of statistics (Corr. 2nd print. ed.). New York: Springer.ISBN 0387945466.
^Freedman, D.A. (2005)Statistical Models: Theory and Practice, Cambridge University Press.ISBN 978-0-521-67105-7
^^a ^bFreedman, David A. (2010). Collier, David; Sekhon, Jasjeet S.; Stark, Philp B. (eds.).Statistical Models and Causal Inference: A Dialogue with the Social Sciences. Cambridge University Press.ISBN 978-0-521-12390-7.
^Hogg, R. V., A. Craig, and J. W. McKean. "Intro to Mathematical Statistics." (2005).
^Larsen, Richard J. and Marx, Morris L. "An Introduction to Mathematical Statistics and Its Applications" (2012). Prentice Hall.
^Upton, G., Cook, I. (2008)Oxford Dictionary of Statistics, OUP.ISBN 978-0-19-954145-4
^"Research Nonparametric Methods".Carnegie Mellon University. RetrievedAugust 30, 2022.
^^a ^b"Nonparametric Tests".sphweb.bumc.bu.edu. Retrieved2022-08-31.
^Wald, Abraham (1947).Sequential analysis. New York: John Wiley and Sons.ISBN 0-471-91806-7.See Dover reprint, 2004:ISBN 0-486-43912-7{{cite book}}:ISBN / Date incompatibility (help)
^Wald, Abraham (1950).Statistical Decision Functions. John Wiley and Sons, New York.
^Lehmann, Erich (1997).Testing Statistical Hypotheses (2nd ed.).ISBN 0-387-94919-4.
^Lehmann, Erich; Cassella, George (1998).Theory of Point Estimation (2nd ed.).ISBN 0-387-98502-6.
^Bickel, Peter J.; Doksum, Kjell A. (2001).Mathematical Statistics: Basic and Selected Topics. Vol. 1 (Second (updated printing 2007) ed.). Pearson Prentice-Hall.
^Le Cam, Lucien (1986).Asymptotic Methods in Statistical Decision Theory. Springer-Verlag.ISBN 0-387-96307-3.
^Liese, Friedrich & Miescke, Klaus-J. (2008).Statistical Decision Theory: Estimation, Testing, and Selection. Springer.

Borovkov, A. A. (1999).Mathematical Statistics. CRC Press.ISBN 90-5699-018-7
Virtual Laboratories in Probability and Statistics (Univ. of Ala.-Huntsville)
StatiBot, interactive online expert system on statistical tests.
Ray, Manohar; Sharma, Har Swarup (1966).Mathematical Statistics. Ram Prasad & Sons.ISBN 978-9383385188

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis (see alsoTemplate:Least squares and regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / multivariate / time-series / survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR))
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

v t e Majormathematics areas
History Timeline Future Lists Glossary
Foundations	Category theory Information theory Mathematical logic Philosophy of mathematics Set theory Type theory
Algebra	Abstract Commutative Elementary Group theory Homological Linear Multilinear Universal
Analysis	Calculus Real analysis Complex analysis Hypercomplex analysis Differential equations Functional analysis Harmonic analysis Measure theory
Discrete	Combinatorics Graph theory Order theory
Geometry	Algebraic Analytic Arithmetic Complex Computational Differential Discrete Euclidean Finite
Number theory	Algebraic Analytic Arithmetic Diophantine geometry
Topology	General Algebraic Differential Geometric Homotopy theory Knot theory
Applied	Engineering mathematics Mathematical biology Mathematical chemistry Mathematical economics Mathematical finance Mathematical physics Mathematical psychology Mathematical sociology Mathematical statistics Probability Statistics Systems science Control theory Game theory Operations research
Computational	Computer science Theory of computation Computational complexity theory Numerical analysis Optimization Computer algebra
Related topics	Mathematicians lists Informal mathematics Films about mathematicians Recreational mathematics Mathematics and art Mathematics education Number Large
Mathematics portal Category Commons WikiProject