gbuesing/kmeans-clustererPublic

NotificationsYou must be signed in to change notification settings
Fork17
Star97

k-means clustering in Ruby

License

MIT license

97 stars 17 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
examples		examples
lib		lib
test		test
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
MIT-LICENSE		MIT-LICENSE
README.md		README.md
Rakefile		Rakefile
kmeans-clusterer.gemspec		kmeans-clusterer.gemspec

Repository files navigation

KMeansClusterer

k-means clustering in Ruby. UsesNArray under the hood for fast calculations.

Jump to theexamples directory to see this in action.

Features

Runs multiple clustering attempts to find optimal solution (single runs are susceptible to falling into non-optimal local minima)
Initializes centroids viak-means++ algorithm, for faster convergence
Calculatessilhouette score for evaluation
Option to scale data before clustering, so that output isn't biased by different feature scales
Works with high-dimensional data

Install

gem install kmeans-clusterer

Usage

Simple example:

require'kmeans-clusterer'data=[[40.71,-74.01],[34.05,-118.24],[39.29,-76.61],[45.52,-122.68],[38.9,-77.04],[36.11,-115.17]]labels=['New York','Los Angeles','Baltimore','Portland','Washington DC','Las Vegas']k=2# find 2 clusters in datakmeans=KMeansClusterer.runk,data,labels:labels,runs:5kmeans.clusters.eachdo |cluster|putscluster.id.to_s +'. ' +cluster.points.map(&:label).join(", ") +"\t" +cluster.centroid.to_send# Use existing clusters for prediction with new data:predicted=kmeans.predict[[41.85,-87.65]]# Chicagoputs"\nClosest cluster to Chicago:#{predicted[0]}"# Clustering quality score. Value between -1.0..1.0 (1.0 is best)puts"\nSilhouette score:#{kmeans.silhouette.round(2)}"

Output of simple example:

0. New York, Baltimore, Washington DC [39.63, -75.89]1. Los Angeles, Portland, Las Vegas [38.56, -118.7]Closest cluster to Chicago: 0Silhouette score: 0.91

Options

The following options can be passed in toKMeansClusterer.run:

option	default	description
:labels	nil	optional array of Ruby objects to collate with data array
:runs	10	number of times to run kmeans
:log	false	print stats after each run
:init	:kmpp	algorithm for picking initial cluster centroids. Accepts :kmpp, :random, or an array of k centroids
:scale_data	false	scales features before clustering using formula (data - mean) / std
:float_precision	:double	float precision to use. :double or :single
:max_iter	300	max iterations per run

About

k-means clustering in Ruby

Releases

No releases published

Packages

No packages published

Languages

Ruby100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

KMeansClusterer

Features

Install

Usage

Options

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

gbuesing/kmeans-clusterer

Folders and files

Latest commit

History

Repository files navigation

KMeansClusterer

Features

Install

Usage

Options

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages