Cluster Analysis of Myanmar Census Data using Hybrid Algorithm
Abstract— Clustering, one of the data-mining components is very useful for analyzing valuable information and has been applied in many application areas. The census dataset is collected from the 2014 Myanmar population and Housing Census and the key objective of the 2014 Census is to provide important information about the population to Government and other stakeholders in terms of demographic, educational, social, and economic characteristics, living conditions, and household amenities. To avoid the performance and cluster quality issues, this paper proposed the hybrid clustering algorithm that combine with Partition Around medoids (PAM), and one of the metaheuristics algorithms, Bat where the Bat is adjusted to solve the data cluster problem to locate multiple optimal medoids based on the multimodal search capability of the Bat. In order to handle large volume of data, Apache Spark parallel framework has introduced to run the proposed algorithms. Experimental results show that the proposed algorithm takes a significantly reduce time in computation with comparable performance against the PAM for large dataset. At the same time, the cluster quality of the proposed system is evaluated using silhouette validation, and observed that the proposed algorithm performed well.
Index Terms— clustering, PAM, Bat, Apache Spark, Silhouette
Nway Yu Aung, Kyawt Kyawt San, Swe Zin Hlaing
University of Information Technology (UIT), Yangon, MYANMAR
Cite: Nway Yu Aung, Kyawt Kyawt San, Swe Zin Hlaing, "Cluster Analysis of Myanmar Census Data using Hybrid Algorithm, " Proceedings of 2021 the 11th International Workshop on Computer Science and Engineering (WCSE 2021), pp. 54-58, February 25-27, 2021.