Real-time Big Data Analytics for Feature Selection on Apache Spark
Abstract— Real-time data analysis is a key research in many domains. It can be applied to pre-existing or prescriptive models. The effective result is that monitor the account and review on a real-time action. Apache spark machine learning library Mllib can be distinct display place for real-time assessment foe extracting, transforming and selecting features and classification, clustering and frequent pattern mining. Feature selection is the detection in a group of feature what are the most relevant and removing the redundant data. Specifically, we made using the Apache spark tool and analyze the streaming time-series data using Mllib to extract the high qualitative feature in efficiently to get qualitative and high performance model.
Index Terms— feature selection, apache spark, filter method, real-time data
Lwin May Thant, Sabai Phyu
University of Computer Studies, MYANMAR
Cite: Lwin May Thant, Sabai Phyu, "Real-time Big Data Analytics for Feature Selection on Apache Spark," Proceedings of 2020 the 10th International Workshop on Computer Science and Engineering (WCSE 2020), pp. 22-26, Yangon (Rangoon), Myanmar (Burma), February 26-28, 2020.