TheHopkins statistic (introduced by Brian Hopkins andJohn Gordon Skellam) is a way of measuring thecluster tendency of a data set.[1] It belongs to the family of sparse sampling tests. It acts as astatistical hypothesis test where thenull hypothesis is that the data is generated by aPoisson point process and are thus uniformly randomly distributed.[2] If individuals are aggregated, then its value approaches 0, and if they are randomly distributed along the value tends to 0.5.[3]
A typical formulation of the Hopkins statistic follows.[2]
- Let
be the set of
data points. - Generate a random sample
of
data points sampled without replacement from
. - Generate a set
of
uniformly randomly distributed data points. - Define two distance measures,
the minimum distance (given some suitable metric) of
to its nearest neighbour in
, and
the minimum distance of
to its nearest neighbour
With the above notation, if the data is
dimensional, then the Hopkins statistic is defined as:[4]

Under the null hypotheses, this statistic has a Beta(m,m) distribution.
Notes and references
[edit]