M-N scatter plots technique for evaluating varying-size clusters and setting the parameters of Bi-CoPaM and Uncles methods
Abu-Jamous B., Fa R., Roberts DJ., Nandi AK.
The recently proposed UNCLES method has the ability to unify clustering results from multiple datasets under different types of external specifications. It can also tunably tighten the results such that many objects are unassigned from all of the clusters to obtain few tight clusters. Despite the success of this method, setting its parameters, such as the number of clusters (K) and the tuning parameters δ and (δ+, δ-), has never been automated. As its clusters vary in size, they cannot be validated by the existing validation indices. In this study we present a technique of validation based on our proposed M-N scatter plots. This technique has the ability to provide better fitness values for the clusters which include more objects while preserving their tightness. This well suits the nature of the results of UNCLES. We have applied this technique to a set of bacterial microarray datasets as well as a set of English vowels datasets. Our results demonstrate the success of the M-N plots in selecting the best few clusters out of a pool of clusters generated under varying K, δ, and (δ+, δ-) values. Our results also show that the best few clusters can be originated from different partitions, which shows the power of our technique in evaluating individual clusters rather than whole partitions. Finally, despite proposing this technique within the context of the UNCLES framework, it is readily applicable to other clustering results, especially when the parameters are not confidently predefined. © 2014 IEEE.