Mucha, Hans-Joachim - Archives of Data Science, Series A

Article Details

Title Assessment of Stability in Partitional Clustering Using Resampling Techniques
Authors Mucha, Hans-Joachim
Year 2014
Volume 1(1)
Abstract The assessment of stability in cluster analysis is highly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given data set in order to investigate the stability of results of partitional clustering. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to to this. In addition, we compare bootstrapping with different subsampling schemes (i.e., different cardinality of the drawn sample) with respect to their performance in finding the true number of clusters for both synthetic and real data.