nredell/RARIPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star4

A python package which implements a distance-based extension of the adjusted Rand index for the supervised validation of 2 cluster analysis solutions

License

MIT license

4 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
.idea		.idea
docs		docs
rari		rari
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

package.rari

rari is a Python implementation of Pinto et. al's ranked adjusted Rand index (RARI) fromRanked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement.RARI is an extension of theadjusted Rand index (ARI)that measures the agreement between two independent clustering solutions while incorporating distancesbetween instances/clusters from each solution.

RARI = 1: Perfect agreement between cluster solutions 'A' and 'B'. Identical cluster partitions and equallyranked relative distances between clusters in cluster solutions 'A' and 'B'.
RARI = 0: No agreement between cluster solutions 'A' and 'B'. Only occurs when, in cluster solution'A', all instances are in the same cluster and, in cluster solution 'B', all instances are in their own cluster and allclusters are equidistant from each other.

Roughly speaking, the benefit of RARI is in penalizing the ARI when a given pair of instances is close together in clustersolution 'A' and far apart in cluster solution 'B'.

Lightning Example

Below is a comparison of the agreement between hierarchical and k-means clustering solutions on the iris data set. Thesame distance matrix is used to calculate pairwise distances between each iris instance, but this is not a requirement.

fromsklearn.datasetsimportload_irisfromsklearn.clusterimportAgglomerativeClustering,KMeansfromsklearn.metricsimportpairwise_distancesfromrariimportrariX=load_iris().datamodel_1=AgglomerativeClustering(n_clusters=3,linkage='ward')x=model_1.fit_predict(X)model_2=KMeans(n_clusters=3)y=model_2.fit_predict(X)dist_x=pairwise_distances(X,metric='euclidean')dist_y=pairwise_distances(X,metric='euclidean')rari(x,y,dist_x,dist_y)

Out[1]:.975

Install

Development

pipinstallgit+https://github.com/nredell/rari

Intuition

Below is Figure 1 from Pinto et. al's article which demonstrates the impact of inter-cluster distances on the RARImetric as compared to, say, the ARI.

Examples

Example 1: ARI vs. RARI, Few Clusters, High Agreement

importnumpyasnpimportpandasaspdfromsklearn.datasetsimportmake_blobsfromsklearn.clusterimportAgglomerativeClustering,KMeansfromsklearn.metricsimportadjusted_rand_score,pairwise_distancesfromrariimportrariX,y=make_blobs(n_samples=[50,50,50],n_features=2,cluster_std=1.0,center_box=(-5.0,5.0),shuffle=True,random_state=224)data=pd.DataFrame(np.hstack([X,y[:,np.newaxis]]),columns=["X1","X2","Cluster"])model_1=AgglomerativeClustering(n_clusters=3,linkage='ward')x=model_1.fit_predict(X)model_2=KMeans(n_clusters=3)y=model_2.fit_predict(X)dist_x=pairwise_distances(X,metric='euclidean')dist_y=pairwise_distances(X,metric='euclidean')

adjusted_rand_score(x,y)rari(x,y,dist_x,dist_y)

ARI: .83RARI: .89

Example 2: ARI vs. RARI, A New Data Point

The toy 1D example below illustrates how the dynamic RARI changes as the distance between clusters changeswhile the static ARI remains the same.

Imagine that the moving data point represents a new data point addedto the data set, at which point each of 2 models is re-run and the clusters are re-labeled. For the sake of illustration, thelabels for this new data point from each model are held constant through each of the 11 analyses to emphasize the impact of cluster spacing.In a real problem, it's likely that the moving data point would be classified as a '2' as it approaches the yellow '2' on theright hand side of each plot. However, this change of labels may not even occur in a simple 2D example with a method like spectralclustering. And our intuitions will fail us in higher dimensions, but RARI will account for these changes in cluster orientation if so desired.

Implementation Details

At present, inter-cluster distances are based on the euclidean distance between pairs of instances indist_x anddist_y.That is to say, even if the input pairwise distance matrices are, for example, cosine and manhattan, the inter-cluster distance ranksare still based on a euclidean, complete linkage measure of these pairwise distances. This will be relaxed in the future with supportfor additional input arguments.

About

A python package which implements a distance-based extension of the adjusted Rand index for the supervised validation of 2 cluster analysis solutions

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

package.rari

Lightning Example

Install

Intuition

Examples

Example 1: ARI vs. RARI, Few Clusters, High Agreement

Example 2: ARI vs. RARI, A New Data Point

Implementation Details

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

nredell/RARI

Folders and files

Latest commit

History

Repository files navigation

package.rari

Lightning Example

Install

Intuition

Examples

Example 1: ARI vs. RARI, Few Clusters, High Agreement

Example 2: ARI vs. RARI, A New Data Point

Implementation Details

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages