davgit/javascript-algorithmsPublic

forked fromtrekhleb/javascript-algorithms

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Commit569fd95

committed

Simplify k-Means clustering algorithm.

1 parentb7cd425 commit569fd95Copy full SHA for 569fd95

File tree

5 files changed

+107

-116

lines changed

README.md
src/algorithms/ml
- k-means
  - README.md
  - __test__
    - kMeans.test.js
  - kMeans.js
- kmeans
  - kmeans.js

5 files changed

+107

-116

lines changed

`‎README.md`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -147,7 +147,7 @@ a set of rules that precisely define a sequence of operations.`
`147`	`147`	`*Machine Learning`
`148`	`148`	*`B`[NanoNeuron](https://github.com/trekhleb/nano-neuron) - 7 simple JS functions that illustrate how machines can actually learn (forward/backward propagation)
`149`	`149`	*`B`[k-NN](src/algorithms/ml/knn) - k-nearest neighbors classification algorithm
`150`		-*`B`[k-Means](src/algorithms/ml/kmeans) - k-Means clustering algorithm
	`150`	+*`B`[k-Means](src/algorithms/ml/k-means) - k-Means clustering algorithm
`151`	`151`	`*Uncategorized`
`152`	`152`	*`B`[Tower of Hanoi](src/algorithms/uncategorized/hanoi-tower)
`153`	`153`	*`B`[Square Matrix Rotation](src/algorithms/uncategorized/square-matrix-rotation) - in-place algorithm

`‎src/algorithms/ml/kmeans/README.mdrenamed to‎src/algorithms/ml/k-means/README.md`

Lines changed: 7 additions & 7 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,10 +1,10 @@`
`1`	`1`	`#k-Means Algorithm`
`2`	`2`
`3`		`-Thek-Means algorithm is an unsupervised Machine Learning algorithm. It's a clustering algorithm, which groups the sample data on the basis of similarity betweendimentions of vectors.`
	`3`	`+Thek-Means algorithm is an unsupervised Machine Learning algorithm. It's a clustering algorithm, which groups the sample data on the basis of similarity betweendimensions of vectors.`
`4`	`4`
`5`		`-In k-Means classification, the output is a set ofclassess asssigned to each vector. Each cluster location iscontinously optimized in order to get the accurate locations of each cluster such that they represent each group clearly.`
	`5`	`+In k-Means classification, the output is a set ofclasses assigned to each vector. Each cluster location iscontinuously optimized in order to get the accurate locations of each cluster such that they represent each group clearly.`
`6`	`6`
`7`		`-The idea is to calculate the similarity between cluster location and data vectors, and reassign clusters based on it.[Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) is used mostly for this task.`
	`7`	`+The idea is to calculate the similarity between cluster location and data vectors, and reassign clusters based on it.[Euclidean distance](https://github.com/trekhleb/javascript-algorithms/tree/master/src/algorithms/math/euclidean-distance) is used mostly for this task.`
`8`	`8`
`9`	`9`	`![Euclidean distance between two points](https://upload.wikimedia.org/wikipedia/commons/5/55/Euclidean_distance_2d.svg)`
`10`	`10`
`@@ -13,19 +13,19 @@ _Image source: [Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)_`
`13`	`13`	`The algorithm is as follows:`
`14`	`14`
`15`	`15`	`1. Check for errors like invalid/inconsistent data`
`16`		`-2. Initialize thek cluster locations with initial/randomk points`
	`16`	+2. Initialize the`k` cluster locations with initial/random`k` points
`17`	`17`	`3. Calculate the distance of each data point from each cluster`
`18`		`-4. Assign the cluster label of each data point equal to that of the cluster atit's minimum distance`
	`18`	`+4. Assign the cluster label of each data point equal to that of the cluster atits minimum distance`
`19`	`19`	`5. Calculate the centroid of each cluster based on the data points it contains`
`20`	`20`	`6. Repeat each of the above steps until the centroid locations are varying`
`21`	`21`
`22`	`22`	`Here is a visualization of k-Means clustering for better understanding:`
`23`	`23`
`24`	`24`	`![KNN Visualization 1](https://upload.wikimedia.org/wikipedia/commons/e/ea/K-means_convergence.gif)`
`25`	`25`
`26`		`-_Image source:[Wikipedia](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)_`
	`26`	`+_Image source:[Wikipedia](https://en.wikipedia.org/wiki/K-means_clustering)_`
`27`	`27`
`28`		-The centroids are movingcontinously in order to create better distinction between the different set of data points. As we can see, after a few iterations, the difference in centroids is quite low between iterations. For example betweenitrations`13` and`14` the difference is quite small because there the optimizer is tuning boundary cases.
	`28`	+The centroids are movingcontinuously in order to create better distinction between the different set of data points. As we can see, after a few iterations, the difference in centroids is quite low between iterations. For example betweeniterations`13` and`14` the difference is quite small because there the optimizer is tuning boundary cases.
`29`	`29`
`30`	`30`	`##References`
`31`	`31`

`‎src/algorithms/ml/kmeans/test/kmeans.test.jsrenamed to‎src/algorithms/ml/k-means/test/kMeans.test.js`

Lines changed: 14 additions & 10 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,36 +1,40 @@`
`1`		`-importkMeansfrom'../kmeans';`
	`1`	`+importKMeansfrom'../kMeans';`
`2`	`2`
`3`	`3`	`describe('kMeans',()=>{`
`4`	`4`	`it('should throw an error on invalid data',()=>{`
`5`	`5`	`expect(()=>{`
`6`		`-kMeans();`
`7`		`-}).toThrowError('Either dataSet or labels or toClassify were not set');`
	`6`	`+KMeans();`
	`7`	`+}).toThrowError('The data is empty');`
`8`	`8`	`});`
`9`	`9`
`10`	`10`	`it('should throw an error on inconsistent data',()=>{`
`11`	`11`	`expect(()=>{`
`12`		`-kMeans([[1,2],[1]],2);`
`13`		`-}).toThrowError('Inconsistent vector lengths');`
	`12`	`+KMeans([[1,2],[1]],2);`
	`13`	`+}).toThrowError('Matrices have different shapes');`
`14`	`14`	`});`
`15`	`15`
`16`	`16`	`it('should find the nearest neighbour',()=>{`
`17`		`-constdataSet=[[1,1],[6,2],[3,3],[4,5],[9,2],[2,4],[8,7]];`
	`17`	`+constdata=[[1,1],[6,2],[3,3],[4,5],[9,2],[2,4],[8,7]];`
`18`	`18`	`constk=2;`
`19`		`-constexpectedCluster=[0,1,0,1,1,0,1];`
`20`		`-expect(kMeans(dataSet,k)).toEqual(expectedCluster);`
	`19`	`+constexpectedClusters=[0,1,0,1,1,0,1];`
	`20`	`+expect(KMeans(data,k)).toEqual(expectedClusters);`
	`21`	`+`
	`22`	`+expect(KMeans([[0,0],[0,1],[10,10]],2)).toEqual(`
	`23`	`+[0,0,1],`
	`24`	`+);`
`21`	`25`	`});`
`22`	`26`
`23`	`27`	`it('should find the clusters with equal distances',()=>{`
`24`	`28`	`constdataSet=[[0,0],[1,1],[2,2]];`
`25`	`29`	`constk=3;`
`26`	`30`	`constexpectedCluster=[0,1,2];`
`27`		`-expect(kMeans(dataSet,k)).toEqual(expectedCluster);`
	`31`	`+expect(KMeans(dataSet,k)).toEqual(expectedCluster);`
`28`	`32`	`});`
`29`	`33`
`30`	`34`	`it('should find the nearest neighbour in 3D space',()=>{`
`31`	`35`	`constdataSet=[[0,0,0],[0,1,0],[2,0,2]];`
`32`	`36`	`constk=2;`
`33`	`37`	`constexpectedCluster=[1,1,0];`
`34`		`-expect(kMeans(dataSet,k)).toEqual(expectedCluster);`
	`38`	`+expect(KMeans(dataSet,k)).toEqual(expectedCluster);`
`35`	`39`	`});`
`36`	`40`	`});`

`‎src/algorithms/ml/k-means/kMeans.js`

Lines changed: 85 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,85 @@`
	`1`	`+import*asmtrxfrom'../../math/matrix/Matrix';`
	`2`	`+importeuclideanDistancefrom'../../math/euclidean-distance/euclideanDistance';`
	`3`	`+`
	`4`	`+/**`
	`5`	`+ * Classifies the point in space based on k-Means algorithm.`
	`6`	`+ *`
	`7`	`+ *@param {number[][]} data - array of dataSet points, i.e. [[0, 1], [3, 4], [5, 7]]`
	`8`	`+ *@param {number} k - number of clusters`
	`9`	`+ *@return {number[]} - the class of the point`
	`10`	`+ */`
	`11`	`+exportdefaultfunctionKMeans(`
	`12`	`+data,`
	`13`	`+k=1,`
	`14`	`+){`
	`15`	`+if(!data){`
	`16`	`+thrownewError('The data is empty');`
	`17`	`+}`
	`18`	`+`
	`19`	`+// Assign k clusters locations equal to the location of initial k points.`
	`20`	`+constdataDim=data[0].length;`
	`21`	`+constclusterCenters=data.slice(0,k);`
	`22`	`+`
	`23`	`+// Continue optimization till convergence.`
	`24`	`+// Centroids should not be moving once optimized.`
	`25`	`+// Calculate distance of each candidate vector from each cluster center.`
	`26`	`+// Assign cluster number to each data vector according to minimum distance.`
	`27`	`+`
	`28`	`+// Matrix of distance from each data point to each cluster centroid.`
	`29`	`+constdistances=mtrx.zeros([data.length,k]);`
	`30`	`+`
	`31`	`+// Vector data points' classes. The value of -1 means that no class has bee assigned yet.`
	`32`	`+constclasses=Array(data.length).fill(-1);`
	`33`	`+`
	`34`	`+letiterate=true;`
	`35`	`+while(iterate){`
	`36`	`+iterate=false;`
	`37`	`+`
	`38`	`+// Calculate and store the distance of each data point from each cluster.`
	`39`	`+for(letdataIndex=0;dataIndex<data.length;dataIndex+=1){`
	`40`	`+for(letclusterIndex=0;clusterIndex<k;clusterIndex+=1){`
	`41`	`+distances[dataIndex][clusterIndex]=euclideanDistance(`
	`42`	`+[clusterCenters[clusterIndex]],`
	`43`	`+[data[dataIndex]],`
	`44`	`+);`
	`45`	`+}`
	`46`	`+// Assign the closest cluster number to each dataSet point.`
	`47`	`+constclosestClusterIdx=distances[dataIndex].indexOf(`
	`48`	`+Math.min(...distances[dataIndex]),`
	`49`	`+);`
	`50`	`+`
	`51`	`+// Check if data point class has been changed and we still need to re-iterate.`
	`52`	`+if(classes[dataIndex]!==closestClusterIdx){`
	`53`	`+iterate=true;`
	`54`	`+}`
	`55`	`+`
	`56`	`+classes[dataIndex]=closestClusterIdx;`
	`57`	`+}`
	`58`	`+`
	`59`	`+// Recalculate cluster centroid values via all dimensions of the points under it.`
	`60`	`+for(letclusterIndex=0;clusterIndex<k;clusterIndex+=1){`
	`61`	`+// Reset cluster center coordinates since we need to recalculate them.`
	`62`	`+clusterCenters[clusterIndex]=Array(dataDim).fill(0);`
	`63`	`+letclusterSize=0;`
	`64`	`+for(letdataIndex=0;dataIndex<data.length;dataIndex+=1){`
	`65`	`+if(classes[dataIndex]===clusterIndex){`
	`66`	`+// Register one more data point of current cluster.`
	`67`	`+clusterSize+=1;`
	`68`	`+for(letdimensionIndex=0;dimensionIndex<dataDim;dimensionIndex+=1){`
	`69`	`+// Add data point coordinates to the cluster center coordinates.`
	`70`	`+clusterCenters[clusterIndex][dimensionIndex]+=data[dataIndex][dimensionIndex];`
	`71`	`+}`
	`72`	`+}`
	`73`	`+}`
	`74`	`+// Calculate the average for each cluster center coordinate.`
	`75`	`+for(letdimensionIndex=0;dimensionIndex<dataDim;dimensionIndex+=1){`
	`76`	`+clusterCenters[clusterIndex][dimensionIndex]=parseFloat(Number(`
	`77`	`+clusterCenters[clusterIndex][dimensionIndex]/clusterSize,`
	`78`	`+).toFixed(2));`
	`79`	`+}`
	`80`	`+}`
	`81`	`+}`
	`82`	`+`
	`83`	`+// Return the clusters assigned.`
	`84`	`+returnclasses;`
	`85`	`+}`

`‎src/algorithms/ml/kmeans/kmeans.js`

Lines changed: 0 additions & 98 deletions

This file was deleted.

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit569fd95

File tree

5 files changed

5 files changed

`‎README.md`

`‎src/algorithms/ml/kmeans/README.mdrenamed to‎src/algorithms/ml/k-means/README.md`

`‎src/algorithms/ml/kmeans/test/kmeans.test.jsrenamed to‎src/algorithms/ml/k-means/test/kMeans.test.js`

`‎src/algorithms/ml/k-means/kMeans.js`

`‎src/algorithms/ml/kmeans/kmeans.js`

0 commit comments

Movatterモバイル変換

File tree

5 files changed

5 files changed

‎README.md

‎src/algorithms/ml/kmeans/README.mdrenamed to‎src/algorithms/ml/k-means/README.md

‎src/algorithms/ml/kmeans/__test__/kmeans.test.jsrenamed to‎src/algorithms/ml/k-means/__test__/kMeans.test.js

‎src/algorithms/ml/k-means/kMeans.js

‎src/algorithms/ml/kmeans/kmeans.js

0 commit comments

`‎README.md`

`‎src/algorithms/ml/kmeans/README.mdrenamed to‎src/algorithms/ml/k-means/README.md`

`‎src/algorithms/ml/kmeans/test/kmeans.test.jsrenamed to‎src/algorithms/ml/k-means/test/kMeans.test.js`

`‎src/algorithms/ml/k-means/kMeans.js`

`‎src/algorithms/ml/kmeans/kmeans.js`