Movatterモバイル変換

[0]ホーム

Jump to content

Geostatistics

Edit links

From Wikipedia, the free encyclopedia

Branch of statistics focusing on spatial data sets

Not to be confused withstatistical geography.

Overview of differentinterpolation methods for the same data points of a terrain surface

Geostatistics is a branch ofstatistics focusing on spatial orspatiotemporal datasets. Developed originally to predictprobability distributions ofore grades formining operations,^[1] it is currently applied in diverse disciplines includingpetroleum geology,hydrogeology,hydrology,meteorology,oceanography,geochemistry,geometallurgy,geography,forestry,environmental control,landscape ecology,soil science, andagriculture (esp. inprecision farming). Geostatistics is applied in varied branches ofgeography, particularly those involving the spread of diseases (epidemiology), the practice of commerce and military planning (logistics), and the development of efficientspatial networks. Geostatistical algorithms are incorporated in many places, includinggeographic information systems (GIS).

Background

[edit]

Geostatistics is intimately related to interpolation methods but extends far beyond simple interpolation problems. Geostatistical techniques rely on statistical models based on random function (orrandom variable) theory to model the uncertainty associated with spatial estimation and simulation.

A number of simpler interpolation methods/algorithms, such asinverse distance weighting,bilinear interpolation andnearest-neighbor interpolation, were already well known before geostatistics.^[2] Geostatistics goes beyond the interpolation problem by considering the studied phenomenon at unknown locations as a set of correlated random variables.

LetZ(x) be the value of the variable of interest at a certain locationx. This value is unknown (e.g., temperature, rainfall,piezometric level, geological facies, etc.). Although there exists a value at locationx that could be measured, geostatistics considers this value as random since it was not measured or has not been measured yet. However, the randomness ofZ(x) is not complete. Still, it is defined by acumulative distribution function (CDF) that depends on certain information that is known about the valueZ(x):

F({\mathit {z}},\mathbf {x} )=\operatorname {Prob} \lbrace Z(\mathbf {x} )\leqslant {\mathit {z}}\mid {\text{information}}\rbrace .

Typically, if the value ofZ is known at locations close tox (or in theneighborhood ofx) one can constrain the CDF ofZ(x) by this neighborhood: if a high spatial continuity is assumed,Z(x) can only have values similar to the ones found in the neighborhood. Conversely, in the absence of spatial continuityZ(x) can take any value. The spatial continuity of the random variables is described by a model of spatial continuity that can be either a parametric function in the case ofvariogram-based geostatistics, or have a non-parametric form when using other methods such asmultiple-point simulation^[3] orpseudo-genetic techniques.

By applying a single spatial model on an entire domain, one makes the assumption thatZ is astationary process. It means that the same statistical properties are applicable on the entire domain. Several geostatistical methods provide ways of relaxing this stationarity assumption.

In this framework, one can distinguish two modeling goals:

Estimating the value forZ(x), typically by theexpectation, themedian or themode of the CDFf(z,x). This is usually denoted as an estimation problem.
Sampling from the entire probability density functionf(z,x) by actually considering each possible outcome of it at each location. This is generally done by creating several alternative maps ofZ, called realizations. Consider a domain discretized inN grid nodes (or pixels). Each realization is a sample of the completeN-dimensional joint distribution function

F(\mathbf {z} ,\mathbf {x} )=\operatorname {Prob} \lbrace Z(\mathbf {x} _{1})\leqslant z_{1},Z(\mathbf {x} _{2})\leqslant z_{2},...,Z(\mathbf {x} _{N})\leqslant z_{N}\rbrace .

In this approach, the presence of multiple solutions to the interpolation problem is acknowledged. Each realization is considered as a possible scenario of what the real variable could be. All associated workflows are then considering ensemble of realizations, and consequently ensemble of predictions that allow for probabilistic forecasting. Therefore, geostatistics is often used to generate or update spatial models when solvinginverse problems.^[4]^[5]

A number of methods exist for both geostatistical estimation and multiple realizations approaches. Several reference books provide a comprehensive overview of the discipline.^[2]^[6]^[7]^[8]^[9]^[10]^[11]^[12]^[13]^[14]^[15]

Methods

[edit]

Estimation

[edit]

Kriging

[edit]

Main article:Kriging

Kriging is a group of geostatistical techniques to interpolate the value of a random field (e.g., the elevation, z, of the landscape as a function of the geographic location) at an unobserved location from observations of its value at nearby locations.

Bayesian estimation

[edit]

Main article:Bayesian inference

Bayesian inference is a method of statistical inference in whichBayes' theorem is used to update a probability model as more evidence or information becomes available. Bayesian inference is playing an increasingly important role in geostatistics.^[16] Bayesian estimation implements kriging through a spatial process, most commonly aGaussian process, and updates the process usingBayes' Theorem to calculate its posterior. High-dimensional Bayesian geostatistics.^[17]

Finite difference method

[edit]

Considering the principle of conservation of probability, recurrent difference equations (finite difference equations) were used in conjunction with lattices to compute probabilities quantifying uncertainty about the geological structures. This procedure is a numerical alternative method to Markov chains and Bayesian models.^[18]

Simulation

[edit]

Aggregation
Dissagregation
Turning bands
Cholesky decomposition
Truncated Gaussian
Plurigaussian
Annealing
Spectral simulation
Sequential Indicator
Sequential Gaussian
Dead Leave
Transition probabilities
Markov chain geostatistics
Support vector machine
Boolean simulation
Genetic models
Pseudo-genetic models
Cellular automata
Multiple-Point Geostatistics

Definitions and tools

[edit]

Notes

[edit]

^Krige, Danie G. (1951). "A statistical approach to some basic mine valuation problems on the Witwatersrand". J. of the Chem., Metal. and Mining Soc. of South Africa 52 (6): 119–139
^^a ^bIsaaks, E. H. and Srivastava, R. M. (1989),An Introduction to Applied Geostatistics, Oxford University Press, New York, USA.
^Mariethoz, Gregoire, Caers, Jef (2014). Multiple-point geostatistics: modeling with training images. Wiley-Blackwell, Chichester, UK, 364 p.
^Hansen, T.M., Journel, A.G., Tarantola, A. and Mosegaard, K. (2006). "Linear inverse Gaussian theory and geostatistics",Geophysics 71
^Kitanidis, P.K. and Vomvoris, E.G. (1983). "A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations",Water Resources Research 19(3):677-690
^Remy, N., et al. (2009),Applied Geostatistics with SGeMS: A User's Guide, 284 pp., Cambridge University Press, Cambridge.
^Deutsch, C.V., Journel, A.G, (1997).GSLIB: Geostatistical Software Library and User's Guide (Applied Geostatistics Series), Second Edition, Oxford University Press, 369 pp.,http://www.gslib.com/
^Chilès, J.-P., and P. Delfiner (1999),Geostatistics - Modeling Spatial Uncertainty, John Wiley & Sons, Inc., New York, USA.
^Lantuéjoul, C. (2002),Geostatistical simulation: Models and algorithms, 232 pp., Springer, Berlin.
^Journel, A. G. and Huijbregts, C.J. (1978)Mining Geostatistics, Academic Press.ISBN 0-12-391050-1
^Kitanidis, P.K. (1997)Introduction to Geostatistics: Applications in Hydrogeology, Cambridge University Press.
^Wackernagel, H. (2003).Multivariate geostatistics, Third edition, Springer-Verlag, Berlin, 387 pp.
^Pyrcz, M. J. and Deutsch, C.V., (2014).Geostatistical Reservoir Modeling, 2nd Edition, Oxford University Press, 448 pp.
^Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012, Multiple-point geostatistical modeling based on the cross-correlation functions, Computational Geosciences, 16(3):779-79742,
^Schnetzler, Manu."Statios - WinGslib". Archived fromthe original on 2015-05-11. Retrieved2005-10-10.
^Banerjee S., Carlin B.P., and Gelfand A.E. (2014). Hierarchical Modeling and Analysis for Spatial Data, Second Edition. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.ISBN 9781439819173
^Banerjee, Sudipto. High-Dimensional Bayesian Geostatistics. Bayesian Anal. 12 (2017), no. 2, 583--614.doi:10.1214/17-BA1056R.https://projecteuclid.org/euclid.ba/1494921642
^Cardenas, IC (2023)."A two-dimensional approach to quantify stratigraphic uncertainty from borehole data using non-homogeneous random fields".Engineering Geology.doi:10.1016/j.enggeo.2023.107001.

References

[edit]

Armstrong, M and Champigny, N, 1988, A Study on Kriging Small Blocks, CIM Bulletin, Vol 82, No 923
Armstrong, M, 1992,Freedom of Speech? De Geeostatisticis, July, No 14
Champigny, N, 1992,Geostatistics: A tool that works,The Northern Miner, May 18
Clark I, 1979,Practical Geostatistics, Applied Science Publishers, London
David, M, 1977, Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing Company, Amsterdam
Hald, A, 1952, Statistical Theory with Engineering Applications, John Wiley & Sons, New York
Honarkhah, Mehrdad; Caers, Jef (2010). "Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling".Mathematical Geosciences.42 (5):487–517.doi:10.1007/s11004-010-9276-7.S2CID 73657847. (best paper award IAMG 09)
ISO/DIS 11648-1 Statistical aspects of sampling from bulk materials-Part1: General principles
Lipschutz, S, 1968, Theory and Problems of Probability, McCraw-Hill Book Company, New York.
Matheron, G. 1962. Traité de géostatistique appliquée. Tome 1, Editions Technip, Paris, 334 pp.
Matheron, G. 1989. Estimating and choosing, Springer-Verlag, Berlin.
McGrew, J. Chapman, & Monroe, Charles B., 2000. An introduction to statistical problem solving in geography, second edition, McGraw-Hill, New York.
Merks, J W, 1992,Geostatistics or voodoo science, The Northern Miner, May 18
Merks, J W,Abuse of statistics, CIM Bulletin, January 1993, Vol 86, No 966
Myers, Donald E.;"What Is Geostatistics?
Philip, G M and Watson, D F, 1986, Matheronian Geostatistics; Quo Vadis?, Mathematical Geology, Vol 18, No 1
Pyrcz, M.J. and Deutsch, C.V., 2014, Geostatistical Reservoir Modeling, 2nd Edition, Oxford University Press, New York, p. 448
Sharov, A: Quantitative Population Ecology, 1996,https://web.archive.org/web/20020605050231/http://www.ento.vt.edu/~sharov/PopEcol/popecol.html
Shine, J.A., Wakefield, G.I.: A comparison of supervised imagery classification using analyst-chosen and geostatistically-chosen training sets, 1999,https://web.archive.org/web/20020424165227/http://www.geovista.psu.edu/sites/geocomp99/Gc99/044/gc_044.htm
Strahler, A. H., and Strahler A., 2006, Introducing Physical Geography, 4th Ed., Wiley.
Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012,Multiple-point geostatistical modeling based on the cross-correlation functions, Computational Geosciences, 16(3):779-79742.
Volk, W, 1980, Applied Statistics for Engineers, Krieger Publishing Company, Huntington, New York.

External links

[edit]

Wikimedia Commons has media related toGeostatistics.

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis (see alsoTemplate:Least squares and regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / multivariate / time-series / survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR))
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging