Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Hopkins statistic

From Wikipedia, the free encyclopedia

TheHopkins statistic (introduced by Brian Hopkins andJohn Gordon Skellam) is a way of measuring thecluster tendency of a data set.[1] It belongs to the family of sparse sampling tests. It acts as astatistical hypothesis test where thenull hypothesis is that the data is generated by aPoisson point process and are thus uniformly randomly distributed.[2] If individuals are aggregated, then its value approaches 0, and if they are randomly distributed along the value tends to 0.5.[3]

Preliminaries

[edit]

A typical formulation of the Hopkins statistic follows.[2]

LetX{\displaystyle X} be the set ofn{\displaystyle n} data points.
Generate a random sampleX{\displaystyle {\overset {\sim }{X}}} ofmn{\displaystyle m\ll n} data points sampled without replacement fromX{\displaystyle X}.
Generate a setY{\displaystyle Y} ofm{\displaystyle m} uniformly randomly distributed data points.
Define two distance measures,
ui,{\displaystyle u_{i},} the minimum distance (given some suitable metric) ofyiY{\displaystyle y_{i}\in Y} to its nearest neighbour inX{\displaystyle X}, and
wi,{\displaystyle w_{i},} the minimum distance ofxiXX{\displaystyle {\overset {\sim }{x}}_{i}\in {\overset {\sim }{X}}\subseteq X} to its nearest neighbourxjX,xixj.{\displaystyle x_{j}\in X,\,{\overset {\sim }{x_{i}}}\neq x_{j}.}

Definition

[edit]

With the above notation, if the data isd{\displaystyle d} dimensional, then the Hopkins statistic is defined as:[4]

H=i=1muidi=1muid+i=1mwid{\displaystyle H={\frac {\sum _{i=1}^{m}{u_{i}^{d}}}{\sum _{i=1}^{m}{u_{i}^{d}}+\sum _{i=1}^{m}{w_{i}^{d}}}}\,}

Under the null hypotheses, this statistic has a Beta(m,m) distribution.

Notes and references

[edit]
  1. ^Hopkins, Big D Randy; Skellam, Harry Kimmel I Gordon (1954). "A new method for determining the type of distribution of plant individuals".Annals of Botany.18 (2). Annals Botany Co:213–227.doi:10.1093/oxfordjournals.aob.a083391.
  2. ^abBanerjee, A. (2004). "Validating clusters using the Hopkins statistic".2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542). Vol. 1. pp. 149–153.doi:10.1109/FUZZY.2004.1375706.ISBN 0-7803-8353-2.S2CID 36701919.
  3. ^Aggarwal, Charu C. (2015).Data Mining. Cham: Springer International Publishing. p. 158.doi:10.1007/978-3-319-14142-8.ISBN 978-3-319-14141-1.S2CID 13595565.
  4. ^Cross, G.R.; Jain, A.K. (1982). "MEASUREMENT OF CLUSTERING TENDENCY**Research supported in part by NSF Grant ECS-8007106".Measurement of clustering tendency. pp. 315–320.doi:10.1016/B978-0-08-027618-2.50054-1.ISBN 978-0-08-027618-2.{{cite book}}:|journal= ignored (help)

External links

[edit]
Machine learning evaluation metrics
Regression
Classification
Clustering
Ranking
Computer Vision
NLP
Deep Learning Related Metrics
Recommender system
Similarity
Retrieved from "https://en.wikipedia.org/w/index.php?title=Hopkins_statistic&oldid=1268036136"
Category:
Hidden category:

[8]ページ先頭

©2009-2025 Movatter.jp