Novel Clustering-based Class-asociation Rule Mining Method for Handling Class-Imbalanced Datasets
Abstract— Class-association Rules (CARs) mining is a knowledge discovery technique with many practical applications. One of the extensions of mining CARs algorithm is to combine information about data classes to derive rules between item and class. However, in the class-imbalance field, it is difficult to mine the rules related to minor classes. One of the solutions is at first to cluster with the combination with CARs mining, then the items of minor classes can be grouped to some clusters. Thus, the corresponding rules will be easier to detect. The k-means clustering method is often used due to its fast computing speed. However, the clustering results of k-means are non-deterministic, so it may affect the clustering quality. In this study, we propose a new direction for combining k-means and Hierarchical Agglomerative Clustering, and continue with class-based association rule mining. Our method has the same execution time as the k-means method but has better clustering quality, so the generated rules are also more accurate, as illustrated in the experimental results.
Index Terms— accuracy, CARs, classification, class-imbalanced dataset, clustering
People’s Police College II, VIETNAM
Ho Chi Minh City University of Technology, VIETNAM
Ho Chi Minh City Open University, VIETNAM
Cite: Tien-Dung Phan, Thanh-Tho Quan, Thi-Kim-Anh Vo, "A Novel Clustering-based Class-asociation Rule Mining Method for Handling Class-Imbalanced Datasets," Proceedings of 2020 the 10th International Workshop on Computer Science and Engineering (WCSE 2020), pp. 6-10, Yangon (Rangoon), Myanmar (Burma), February 26-28, 2020.