- Notifications
You must be signed in to change notification settings - Fork17
gbuesing/kmeans-clusterer
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
k-means clustering in Ruby. UsesNArray under the hood for fast calculations.
Jump to theexamples directory to see this in action.
- Runs multiple clustering attempts to find optimal solution (single runs are susceptible to falling into non-optimal local minima)
- Initializes centroids viak-means++ algorithm, for faster convergence
- Calculatessilhouette score for evaluation
- Option to scale data before clustering, so that output isn't biased by different feature scales
- Works with high-dimensional data
gem install kmeans-clustererSimple example:
require'kmeans-clusterer'data=[[40.71,-74.01],[34.05,-118.24],[39.29,-76.61],[45.52,-122.68],[38.9,-77.04],[36.11,-115.17]]labels=['New York','Los Angeles','Baltimore','Portland','Washington DC','Las Vegas']k=2# find 2 clusters in datakmeans=KMeansClusterer.runk,data,labels:labels,runs:5kmeans.clusters.eachdo |cluster|putscluster.id.to_s +'. ' +cluster.points.map(&:label).join(", ") +"\t" +cluster.centroid.to_send# Use existing clusters for prediction with new data:predicted=kmeans.predict[[41.85,-87.65]]# Chicagoputs"\nClosest cluster to Chicago:#{predicted[0]}"# Clustering quality score. Value between -1.0..1.0 (1.0 is best)puts"\nSilhouette score:#{kmeans.silhouette.round(2)}"
Output of simple example:
0. New York, Baltimore, Washington DC [39.63, -75.89]1. Los Angeles, Portland, Las Vegas [38.56, -118.7]Closest cluster to Chicago: 0Silhouette score: 0.91The following options can be passed in toKMeansClusterer.run:
| option | default | description |
|---|---|---|
| :labels | nil | optional array of Ruby objects to collate with data array |
| :runs | 10 | number of times to run kmeans |
| :log | false | print stats after each run |
| :init | :kmpp | algorithm for picking initial cluster centroids. Accepts :kmpp, :random, or an array of k centroids |
| :scale_data | false | scales features before clustering using formula (data - mean) / std |
| :float_precision | :double | float precision to use. :double or :single |
| :max_iter | 300 | max iterations per run |
About
k-means clustering in Ruby
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.