Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Multivariate statistics

From Wikipedia, the free encyclopedia
Simultaneous observation and analysis of more than one outcome variable
"Multivariate analysis" redirects here. For the usage in mathematics, seeMultivariable calculus.

Multivariate statistics is a subdivision ofstatistics encompassing the simultaneous observation and analysis of more than oneoutcome variable, i.e.,multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

In addition, multivariate statistics is concerned with multivariateprobability distributions, in terms of both

  • how these can be used to represent the distributions of observed data;
  • how they can be used as part ofstatistical inference, particularly where several different quantities are of interest to the same analysis.

Certain types of problems involving multivariate data, for examplesimple linear regression andmultiple regression, arenot usually considered to be special cases of multivariate statistics because the analysis is dealt with by considering the (univariate) conditional distribution of a single outcome variable given the other variables.

Multivariate analysis

[edit]
See also:Univariate analysis

Multivariate analysis (MVA) is based on the principles of multivariate statistics. Typically, MVA is used to address situations where multiple measurements are made on each experimental unit and the relations among these measurements and their structures are important.[1] A modern, overlapping categorization of MVA includes:[1]

Multivariate analysis can be complicated by the desire to include physics-based analysis to calculate the effects of variables for a hierarchical "system-of-systems". Often, studies that wish to use multivariate analysis are stalled by the dimensionality of the problem. These concerns are often eased through the use ofsurrogate models, highly accurate approximations of the physics-based code. Since surrogate models take the form of an equation, they can be evaluated very quickly. This becomes an enabler for large-scale MVA studies: while aMonte Carlo simulation across the design space is difficult with physics-based codes, it becomes trivial when evaluating surrogate models, which often take the form ofresponse-surface equations.

Types of analysis

[edit]

Many different models are used in MVA, each with its own type of analysis:

  1. Multivariate analysis of variance (MANOVA) extends theanalysis of variance to cover cases where there is more than one dependent variable to be analyzed simultaneously; see alsoMultivariate analysis of covariance (MANCOVA).
  2. Multivariate regression attempts to determine a formula that can describe how elements in avector of variables respond simultaneously to changes in others. For linear relations, regression analyses here are based on forms of thegeneral linear model. Some suggest that multivariate regression is distinct from multivariable regression, however, that is debated and not consistently true across scientific fields.[2]
  3. Principal components analysis (PCA) creates a new set oforthogonal variables that contain the same information as the original set. It rotates the axes of variation to give a new set of orthogonal axes, ordered so that they summarize decreasing proportions of the variation.
  4. Factor analysis is similar to PCA but allows the user to extract a specified number of synthetic variables, fewer than the original set, leaving the remaining unexplained variation as error. The extracted variables are known as latent variables or factors; each one may be supposed to account for covariation in a group of observed variables.
  5. Canonical correlation analysis finds linear relationships among two sets of variables; it is the generalised (i.e. canonical) version of bivariate[3] correlation.
  6. Redundancy analysis[4] (RDA) is similar to canonical correlation analysis but allows the user to derive a specified number of synthetic variables from one set of (independent) variables that explain as much variance as possible in another (independent) set. It is a multivariate analogue ofregression.[5]
  7. Correspondence analysis (CA), or reciprocal averaging, finds (like PCA) a set of synthetic variables that summarise the original set. The underlying model assumes chi-squared dissimilarities among records (cases).
  8. Canonical (or "constrained") correspondence analysis (CCA) for summarising the joint variation in two sets of variables (like redundancy analysis); combination of correspondence analysis and multivariate regression analysis. The underlying model assumes chi-squared dissimilarities among records (cases).
  9. Multidimensional scaling comprises various algorithms to determine a set of synthetic variables that best represent the pairwise distances between records. The original method isprincipal coordinates analysis (PCoA; based on PCA).
  10. Discriminant analysis, or canonical variate analysis, attempts to establish whether a set of variables can be used to distinguish between two or more groups of cases.
  11. Linear discriminant analysis (LDA) computes a linear predictor from two sets of normally distributed data to allow for classification of new observations.
  12. Clustering systems assign objects into groups (called clusters) so that objects (cases) from the same cluster are more similar to each other than objects from different clusters.
  13. Recursive partitioning creates a decision tree that attempts to correctly classify members of the population based on a dichotomous dependent variable.
  14. Artificial neural networks extend regression and clustering methods to non-linear multivariate models.
  15. Statistical graphics such as tours,parallel coordinate plots, scatterplot matrices can be used to explore multivariate data.
  16. Simultaneous equations models involve more than one regression equation, with different dependent variables, estimated together.
  17. Vector autoregression involves simultaneous regressions of varioustime series variables on their own and each other's lagged values.
  18. Principal response curves analysis (PRC) is a method based on RDA that allows the user to focus on treatment effects over time by correcting for changes in control treatments over time.[6]
  19. Iconography of correlations consists in replacing a correlation matrix by a diagram where the "remarkable" correlations are represented by a solid line (positive correlation), or a dotted line (negative correlation).

Dealing with incomplete data

[edit]

It is very common that in an experimentally acquired set of data the values of some components of a given data point aremissing. Rather than discarding the whole data point, it is common to "fill in" values for the missing components, a process called "imputation".[7]

Important probability distributions

[edit]

There is a set ofprobability distributions used in multivariate analyses that play a similar role to the corresponding set of distributions that are used inunivariate analysis when thenormal distribution is appropriate to a dataset. These multivariate distributions are:

TheInverse-Wishart distribution is important inBayesian inference, for example inBayesian multivariate linear regression. Additionally,Hotelling's T-squared distribution is a multivariate distribution, generalisingStudent's t-distribution, that is used in multivariatehypothesis testing.

History

[edit]

C.R. Rao made significant contributions to multivariate statistical theory throughout his career, particularly in the mid-20th century. One of his key works is the book titled "Advanced Statistical Methods in Biometric Research," published in 1952. This work laid the foundation for many concepts in multivariate statistics.[8]Anderson's 1958 textbook, An Introduction to Multivariate Statistical Analysis,[9] educated a generation of theorists and applied statisticians; Anderson's book emphasizeshypothesis testing vialikelihood ratio tests and the properties ofpower functions:admissibility,unbiasedness andmonotonicity.[10][11]

MVA was formerly discussed solely in the context of statistical theories, due to the size and complexity of underlying datasets and its high computational consumption. With the dramatic growth of computational power, MVA now plays an increasingly important role in data analysis and has wide application inOmics fields.

Applications

[edit]

Software and tools

[edit]

There are an enormous number of software packages and other tools for multivariate analysis, including:

See also

[edit]

References

[edit]
  1. ^abOlkin, I.; Sampson, A. R. (2001-01-01),"Multivariate Analysis: Overview", in Smelser, Neil J.; Baltes, Paul B. (eds.),International Encyclopedia of the Social & Behavioral Sciences, Pergamon, pp. 10240–10247,ISBN 978-0-08-043076-8, retrieved2019-09-02
  2. ^Hidalgo, B; Goodman, M (2013)."Multivariate or multivariable regression?".Am J Public Health.103 (1):39–40.doi:10.2105/AJPH.2012.300897.PMC 3518362.PMID 23153131.
  3. ^Unsophisticated analysts of bivariate Gaussian problems may find useful a crude but accuratemethod of accurately gauging probability by simply taking the sumS of theN residuals' squares, subtracting the sumSm at minimum, dividing this difference bySm, multiplying the result by (N - 2) and taking the inverse anti-ln of half that product.
  4. ^Series, Developed and maintained by the contributors of the QCBS R. Workshop.Chapter 6 Redundancy analysis | Workshop 10: Advanced Multivariate Analyses in R.{{cite book}}:|first= has generic name (help)
  5. ^Van Den Wollenberg, Arnold L. (1977). "Redundancy analysis an alternative for canonical correlation analysis".Psychometrika.42 (2):207–219.doi:10.1007/BF02294050.
  6. ^ter Braak, Cajo J.F. & Šmilauer, Petr (2012).Canoco reference manual and user's guide: software for ordination (version 5.0), p292. Microcomputer Power, Ithaca, NY.
  7. ^J.L. Schafer (1997).Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC.ISBN 978-1-4398-2186-2.
  8. ^Dasgupta, Anirban (2024)."C.R. Rao: Paramount statistical scientist (1920 to 2023)".Proceedings of the National Academy of Sciences.121 (9) e2321318121.Bibcode:2024PNAS..12121318D.doi:10.1073/pnas.2321318121.PMC 10907269.PMID 38377193.
  9. ^T.W. Anderson (1958) An Introduction to Multivariate Analysis, New York: WileyISBN 0471026409; 2e (1984)ISBN 0471889873; 3e (2003)ISBN 0471360910
  10. ^Sen, Pranab Kumar; Anderson, T. W.; Arnold, S. F.; Eaton, M. L.; Giri, N. C.; Gnanadesikan, R.; Kendall, M. G.; Kshirsagar, A. M.; et al. (June 1986). "Review: Contemporary Textbooks on Multivariate Statistical Analysis: A Panoramic Appraisal and Critique".Journal of the American Statistical Association.81 (394):560–564.doi:10.2307/2289251.ISSN 0162-1459.JSTOR 2289251.(Pages 560–561)
  11. ^Schervish, Mark J. (November 1987)."A Review of Multivariate Analysis".Statistical Science.2 (4):396–413.doi:10.1214/ss/1177013111.ISSN 0883-4237.JSTOR 2245530.
  12. ^Huang, Biwei; Low, Charles Jia Han; Xie, Feng; Glymour, Clark; Zhang, Kun (2022-10-01)."Latent Hierarchical Causal Structure Discovery with Rank Constraints".arXiv.org. Retrieved2025-06-09.
  13. ^"Multivariate Regression Analysis | Stata Data Analysis Examples".stats.oarc.ucla.edu. Retrieved2025-06-09.
  14. ^CRAN has details on the packages available for multivariate data analysis

Further reading

[edit]
  • Johnson, Richard A.; Wichern, Dean W. (2007).Applied Multivariate Statistical Analysis (Sixth ed.). Prentice Hall.ISBN 978-0-13-187715-3.
  • KV Mardia; JT Kent; JM Bibby (1979).Multivariate Analysis. Academic Press.ISBN 0-12-471252-5.
  • A. Sen, M. Srivastava,Regression Analysis — Theory, Methods, and Applications, Springer-Verlag, Berlin, 2011 (4th printing).
  • Cook, Swayne (2007).Interactive Graphics for Data Analysis.
  • Malakooti, B. (2013). Operations and Production Systems with Multiple Objectives. John Wiley & Sons.
  • T. W. Anderson,An Introduction to Multivariate Statistical Analysis, Wiley, New York, 1958.
  • KV Mardia; JT Kent & JM Bibby (1979).Multivariate Analysis. Academic Press.ISBN 978-0-12-471252-2. (M.A. level "likelihood" approach)
  • Feinstein, A. R. (1996)Multivariable Analysis. New Haven, CT: Yale University Press.
  • Hair, J. F. Jr. (1995)Multivariate Data Analysis with Readings, 4th ed. Prentice-Hall.
  • Schafer, J. L. (1997)Analysis of Incomplete Multivariate Data. CRC Press. (Advanced)
  • Sharma, S. (1996)Applied Multivariate Techniques. Wiley. (Informal, applied)
  • Izenman, Alan J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer Texts in Statistics. New York: Springer-Verlag.ISBN 9780387781884.
  • Tinsley, Howard E. A.; Brown, Steven D., eds. (2000).Handbook of Applied Multivariate Statistics and Mathematical Modeling. Academic Press.doi:10.1016/B978-0-12-691360-6.X5000-9.ISBN 978-0-12-691360-6.

External links

[edit]
Wikimedia Commons has media related toMultivariate statistics.
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Portal:
International
National
Other
Retrieved from "https://en.wikipedia.org/w/index.php?title=Multivariate_statistics&oldid=1311796252"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp