Phan, Le, Liu, Hongzhe and Tortora, Cristina - Archives of Data Science, Series B

Article Details

Title K-Means Clustering on Multiple Correspondence Analysis Coordinates
Authors Phan, Le, Liu, Hongzhe and Tortora, Cristina
Year 2019
Volume 1(1)
Abstract On April 18, 2017, the International Federation of Classification Societies (IFCS) issued a challenge to its members and the classification community to analyze a data set of 928 low back pain patients. In this paper, we present our contribution in terms of a cluster analysis of this data set. We will discuss our data cleaning process, which we view as a two-pronged approach: inferring values that are missing not at random and imputing values that are missing at random. We will also discuss the challenges in clustering mixed data types and the required data transformation prior to applying a clustering algorithm. We call our proposed data transformation process split-then-join. Finally, we offer our interpretation of the clustering results with respect to validation variables and we present some thoughts on selecting important variables to classify new observations.