Issue
I start clustering using simple k-mean clustering in weka
after the clustering this result show
Number of iterations: 9
Within cluster sum of squared errors: 570.1974952009115
my questions:
the number of sum of squared errors is huge does this mean my number of cluster is wrong ? and how to define the optimistic number of clusters ?
how to split the data into training and test set to evaluate the performance ? and how to know the right percentage ?
how to measure the SSB
Solution
1.1 In k-means it's you who decides how many clusters to pick. You probably know this already.
1.2 In k-means there is no optimal number of clusters as in "global maximum of a function graph". You decide with respect to your business problem. See also "elbow method" for a semi-empirical procedure that seldom works in practice.
1.3 You might have outliers in your data which make the sum of squares large for any clustering operation. The outliers are always far away from your cluster centers, no matter how many clusters you pick .
2.1 There is no "optimal" percentage split.
2.2 You could use visualization to check if there is any overlap in the clusters. It's also more understandable for your audience to see the "decision boundaries".
3.1 What is SSB?
Answered By - knb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.