This is the program function code for clustering using k-medoids
def kMedoids(D, k, tmax=100): # determine dimensions of distance matrix D m, n = D.shape # randomly initialize an array of k medoid indices M = np.sort(np.random.choice(n, k) # create a copy of the array of medoid indices Mnew = np.copy(M) # initialize a dictionary to represent clusters C = {} for t in range(tmax): # determine clusters, i.e. arrays of data indices J = np.argmin(D[:,M], axis=1) for kappa in range(k): C[kappa] = np.where(J==kappa)[0] # update cluster medoids for kappa in range(k): J = np.mean(D[np.ix_(C[kappa],C[kappa])],axis=1) j = np.argmin(J) Mnew[kappa] = C[kappa][j] np.sort(Mnew) # check for convergence if np.array_equal(M, Mnew): break M = np.copy(Mnew) else: # final update of cluster memberships J = np.argmin(D[:,M], axis=1) for kappa in range(k): C[kappa] = np.where(J==kappa)[0] # return results return M, Cand the I will call The function KMedoids with this program, I think my program run slowly in line D = Pairwise_distances(arraydata, metric='euclidean')
D = pairwise_distances(arraydata,metric='euclidean')# split into 2 clustersM, C = kMedoids(D, 2)print('medoids:')for point_idx in M: print(arraydata[point_idx] )print('')# array for get labeltemp = []indeks = []print('clustering result:')for label in C: for point_idx in C[label]: print('label {0}: {1}'.format(label, arraydata[point_idx])) temp.append(label) indeks.append(point_idx)This is the result from this program
clustering result:label 0: [0.00000000e+00 0.00000000e+00 1.00000000e+00 1.00000000e+001.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+001.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+001.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+001.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+001.00000000e+00 1.00000000e+00 1.00000000e+00 1.00000000e+001.00000000e+00 0.00000000e+00 1.00000000e+00 1.00000000e+00Why my result of my program is slow for large data and almost have a result "Memory Error"? I hope someone can help me to review this code to improve its performance to get the result and process large amounts of data.
- 2\$\begingroup\$This is still on-topic since the error is due to slow performance (on large sets of data).\$\endgroup\$dustytrash– dustytrash2019-06-24 16:48:21 +00:00CommentedJun 24, 2019 at 16:48
- 1\$\begingroup\$
return M, Clooks misindented. Please doublecheck your indentation.\$\endgroup\$vnp– vnp2019-06-24 17:27:49 +00:00CommentedJun 24, 2019 at 17:27
1 Answer1
This is exactly the implementation found inNumPy and SciPy Recipes for Data Science on k-Medoids Clustering but with some indentation mistakes (probably due to the copy & paste process). To understand the indentation being used in the algorithm is might be worth to read the answers on thefor...else syntax:
A working script with the correct indentation and some improvement is found athttps://github.com/letiantian/kmedoids/blob/master/kmedoids.py :
import numpy as npimport randomdef kMedoids(D, k, tmax=100): # determine dimensions of distance matrix D m, n = D.shape if k > n: raise Exception('too many medoids') # find a set of valid initial cluster medoid indices since we # can't seed different clusters with two points at the same location valid_medoid_inds = set(range(n)) invalid_medoid_inds = set([]) rs,cs = np.where(D==0) # the rows, cols must be shuffled because we will keep the first duplicate below index_shuf = list(range(len(rs))) np.random.shuffle(index_shuf) rs = rs[index_shuf] cs = cs[index_shuf] for r,c in zip(rs,cs): # if there are two points with a distance of 0... # keep the first one for cluster init if r < c and r not in invalid_medoid_inds: invalid_medoid_inds.add(c) valid_medoid_inds = list(valid_medoid_inds - invalid_medoid_inds) if k > len(valid_medoid_inds): raise Exception('too many medoids (after removing {} duplicate points)'.format( len(invalid_medoid_inds))) # randomly initialize an array of k medoid indices M = np.array(valid_medoid_inds) np.random.shuffle(M) M = np.sort(M[:k]) # create a copy of the array of medoid indices Mnew = np.copy(M) # initialize a dictionary to represent clusters C = {} for t in xrange(tmax): # determine clusters, i. e. arrays of data indices J = np.argmin(D[:,M], axis=1) for kappa in range(k): C[kappa] = np.where(J==kappa)[0] # update cluster medoids for kappa in range(k): J = np.mean(D[np.ix_(C[kappa],C[kappa])],axis=1) j = np.argmin(J) Mnew[kappa] = C[kappa][j] np.sort(Mnew) # check for convergence if np.array_equal(M, Mnew): break M = np.copy(Mnew) else: # final update of cluster memberships J = np.argmin(D[:,M], axis=1) for kappa in range(k): C[kappa] = np.where(J==kappa)[0] # return results return M, CNote that theelse as well as thereturn are at the same indentation as thefor-loop.
However, I recommend to change the second inner loop
# update cluster medoidsfor kappa in range(k): J = np.mean(D[np.ix_(C[kappa],C[kappa])],axis=1) j = np.argmin(J) Mnew[kappa] = C[kappa][j]to
# update cluster medoidsfor kappa in range(k): J = np.mean(D[np.ix_(C[kappa],C[kappa])],axis=1) # Fix for the low-idx bias by J.Nagele (2019): shuffled_idx = np.arange(len(J)) np.random.shuffle(shuffled_idx) j = shuffled_idx[np.argmin(J[shuffled_idx])] Mnew[kappa] = C[kappa][j]as otherwise the resulting medoids depend on the order of the input. That is because the argmin command returns always the lowest index in case multiple potential medoids have equal distances to the cluster members.
- 4\$\begingroup\$It might be a better answer if you put the part about the indentation first, remember since this is code review the answer should be mostly about the users code.\$\endgroup\$2019-12-17 14:11:16 +00:00CommentedDec 17, 2019 at 14:11
- 1\$\begingroup\$Please quote the relevant parts of your links so readers don't have to visit other sites and so that your answer will still make sense even if the link(s) no longer work.\$\endgroup\$Null– Null2019-12-17 14:23:29 +00:00CommentedDec 17, 2019 at 14:23
- \$\begingroup\$OK I will change it accordingly\$\endgroup\$Jojo– Jojo2019-12-17 17:40:00 +00:00CommentedDec 17, 2019 at 17:40
- \$\begingroup\$I hope now it got better!\$\endgroup\$Jojo– Jojo2019-12-17 19:09:57 +00:00CommentedDec 17, 2019 at 19:09
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.