WCSE 2015
ISBN: 978-981-09-5471-0 DOI: 10.18178/wcse.2015.04.077

An Improved Clustering Algorithm for Big Data Based on K-Means with Optimized Clusters’ Number

Lianjiang Zhu, Tao Du, Shouning Qu, Kai Wang, Yong Zhang

Abstract— To improve the processing ability of big data, a new clustering algorithm is proposed which is designed based on K-means. In this algorithm, a concept of “Silhouette Coefficient” is defined to estimate the result of clustering. Based on silhouette coefficient, the optimized clusters’ number would be chosen, and then K-means algorithm would be operated with this clusters’ number. The algorithm is tested by a real production big data set and compared with classical Kmeans. The result of experiment proves that the improved algorithm has more reasonable result of clustering with little extra calculation.

Index Terms— Big data, Silhouette Coefficient, clustering, optimized clusters’ number.

Lianjiang Zhu, Shouning Qu
Information network centre, University of Jinan, CHINA
Tao Du, Kai Wang
School of information science and engineering, University of Jinan, CHINA
Yong Zhang
School of electrical engineering, University of Jinan, CHINA

[Download]


Cite: Lianjiang Zhu, Tao Du, Shouning Qu, Kai Wang, Yong Zhang, "An Improved Clustering Algorithm for Big Data Based on K-Means with Optimized Clusters’ Number," 2015 The 5th International Workshop on Computer Science and Engineering-Information Processing and Control Engineering (WCSE 2015-IPCE), pp. 467-471, Moscow, Russia, April 15-17, 2015.