Differential Evolution for Large-Scale Clustering
Abstract— Clustering is the task of organizing data instances into groups based on the similarity between them. It plays an essential role in knowledge discovery and data mining, ranging from preprocessing step to the final goal of the task. Evolutionary algorithms (EAs) based clustering methods have developed with the intention of enhancing the effectiveness and accurateness of clustering. The huge amount of data emerging by the progress of technology have become data clustering as a challenging task and as an attractive attention for the use of EAs based approaches. Differential Evolut ion (DE), an instance of EAs, has been exploited to discover the best solution for clustering problems. It has become a successful solution to produce more compact clusters than other traditional clustering techniques. This paper presents a parallel differential evolution algorithm on Spark framework to facilitate huge amount of data clustering. Experimentations were conducted on some frequently used UCI machine learning datasets. The results have presented that the proposed approach is effective and comparable to existing algorithms.
Index Terms— Differential Evolution, Clustering, Apache Spark.
Pyae Pyae Win Cho, Thi Thi Soe Nyunt, Thet Thet Aung
University of Computer Studies, MYANMAR
Cite: Pyae Pyae Win Cho, Thi Thi Soe Nyunt, Thet Thet Aung, "Differential Evolution for Large-Scale Clustering," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering WCSE_2019_SPRING, pp. 58-62, Yangon, Myanmar, February 27-March 1, 2019.