KIT | KIT-Bibliothek | Impressum | Datenschutz

Cluster Validation for Mixed-Type Data

Aschenbruck, Rabea; Szepannek, Gero

Abstract:

For cluster analysis based on mixed-type data (i.e. data consisting of numerical and categorical variables), comparatively few clustering methods are available. One popular approach to deal with this kind of problems is an extension of the k-means algorithm (Huang, 1998), the so-called k-prototype algorithm, which is implemented in the R package clustMixType (Szepannek and Aschenbruck, 2019).
It is further known that the selection of a suitable number of clusters k is particularly crucial in partitioning cluster procedures. Many implementations of cluster validation indices in R are not suitable for mixed-type data. This paper examines the transferability of validation indices, such as the Gamma index, Average Silhouette Width or Dunn index to mixed-type data. Furthermore, the R package clustMixType is extended by these indices and their application is demonstrated. Finally, the behaviour of the adapted indices is tested by a short simulation study using different data scenarios.


Verlagsausgabe §
DOI: 10.5445/KSP/1000098011/02
Veröffentlicht am 23.06.2020
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Wirtschaftsinformatik und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2020
Sprache Englisch
Identifikator ISSN: 2363-9881
KITopen-ID: 1000120412
Erschienen in Archives of Data Science, Series A
Band 6
Heft 1
Seiten P02, 12 S. online
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page