ISBN: 978-981-18-7950-0 DOI: 10.18178/wcse.2023.06.007
A Partitioning and Distributed Caching Approach Based on Adaptive Spectral Clustering for Big Data Streams
Abstract—Big data streams with diversity are generally processed by parallel computing environments with multiple computational nodes. Before processing, the big data streams need to be partitioned into sub-streams and cached on each computational node for subsequent processing. Existing partitioning methods are difficult to process streams with diversity and high-dimensional characteristics. Partitioning with low quality leads to unreasonable cache placing, which in turn leads to more data migration, lower computational efficiency, and smaller velocity of big data streams on a processing system. Inspired by the advantages of spectral clustering in identifying arbitrary manifolds, an approach of partitioning for big data streams based on spectral clustering is proposed, which transforms the partitioning of streams received during each micro window into the clustering of similarity graphs. We formulate an optimization problem for data items received during each micro window. Then, we present an algorithm to optimize the similarity graphs. With the characteristics of data streams changing gradually in adjacent windows, a distributed caching algorithm based on stream partitioning is presented for continuous windows. Experimental analysis shows that the proposed method can significantly improve the velocity and efficiency of the system for stream processing.
Index Terms—big data stream, adaptively spectral clustering, distributed caching, similarity matrix
Shun Wang, Guo-sun Zeng
Department of Computer Science and Technology, Tongji University, CHINA
Tongji Branch, National Engineering & Technology Center of High Performance Computer, CHINA
Cite: Shun Wang, Guo-sun Zeng, "A Partitioning and Distributed Caching Approach Based on Adaptive Spectral Clustering for Big Data Streams" Proceedings of 2023 the 13th International Workshop on Computer Science and Engineering (WCSE 2023), pp. 37-47, June 16-18, 2023.