WCSE 2016
ISBN: 978-981-11-0008-6 DOI: 10.18178/wcse.2016.06.132

Hadoop Local Tasks Scheduling Optimization Algorithm Based on Logistic Regression Model

Shuai Renjun, Shen Yang, Chen Ping, Pan Jing, Dong Yanan

Abstract— For a TaskTracker has multiple local tasks available, by default, the scheduler executes those tasks in succession with the order of the tasks to be found, this is inefficient. In order to optimize the local tasks scheduling, this paper presented Hadoop local tasks scheduling optimization algorithm based on Logistic regression model. First, related feature vectors of the local tasks were selected and defined, then, based on the way of machine learning with Logistic regression model, trained these vector to get the weight of each vector to decide the task priority, and updated the model constantly by the overload rules. The experimental results show that the proposed algorithm improves map task data locality, at the same time of reducing job running time.

Index Terms— Hadoop, MapReduce, local tasks scheduling, task priority , overload rules, Logistic regression model three.

Shuai Renjun, Shen Yang, Pan Jing, Dong Yanan
School of Computer Science and Technology, Nanjing Technology University, CHINA
Chen Ping
Nanjing Health Information Center, CHINA

[Download]


Cite: Shuai Renjun, Shen Yang, Chen Ping, Pan Jing, Dong Yanan, "Hadoop Local Tasks Scheduling Optimization Algorithm Based on Logistic Regression Model," Proceedings of 2016 6th International Workshop on Computer Science and Engineering, pp. 738 -742, Tokyo, 17-19 June, 2016.