Gondeau, Alexandre and Makarenkov, Vladimir - Archives of Data Science, Series B

Article Details

Title Identification of patient classes in low back pain data using crisp and fuzzy clustering methods
Authors Gondeau, Alexandre and Makarenkov, Vladimir
Year 2019
Volume 1(1)
Abstract We performed a cluster analysis of the low back pain dataset in the framework of the IFCS-2017 data challenge. Because the original data contained missing values, the first part of our analysis concerned the imputation of missing values using the Fully Conditional Specification model. The Local Outlier Factor method was then used to detect and eliminate the outliers. After the data normalization, we removed highly correlated variables from the transformed dataset and carried out k-means clustering of the remaining variables based on their correlations, i.e., the variables with the highest mutual correlations were assigned to the same cluster. Once the variables were assigned to different clusters, one representative per cluster, i.e., the variable with the highest contribution score at the first principal component, was selected. Among the 13 selected variables, there are representatives of each of the 6 variable domains (contextual factor, participation, pain, psychological, activity and physical impairment), specified as important in the paper by Nielsen et al. (2016). Different clustering methods, including DAPC, k-means and k-medoids, were then carried out to cluster the reduced low back pain data. Consensus solutions, both crisp and fuzzy, were calculated using the GV3 method. The obtained crisp consensus clustering, including 5 classes, was described in detail and compared to the meta-data annotation.