WCSE 2020 SPRING ISBN: 978-981-14-4787-7
DOI: 10.18178/wcse.2020.02.005

Real-time Big Data Analytics for Feature Selection on Apache Spark

Lwin May Thant, Sabai Phyu

Abstract— Real-time data analysis is a key research in many domains. It can be applied to pre-existing or prescriptive models. The effective result is that monitor the account and review on a real-time action. Apache spark machine learning library Mllib can be distinct display place for real-time assessment foe extracting, transforming and selecting features and classification, clustering and frequent pattern mining. Feature selection is the detection in a group of feature what are the most relevant and removing the redundant data. Specifically, we made using the Apache spark tool and analyze the streaming time-series data using Mllib to extract the high qualitative feature in efficiently to get qualitative and high performance model.

Index Terms— feature selection, apache spark, filter method, real-time data

Lwin May Thant, Sabai Phyu
University of Computer Studies, MYANMAR


Cite: Lwin May Thant, Sabai Phyu, "Real-time Big Data Analytics for Feature Selection on Apache Spark," Proceedings of 2020 the 10th International Workshop on Computer Science and Engineering (WCSE 2020), pp. 22-26, Yangon (Rangoon), Myanmar (Burma), February 26-28, 2020.