Replication Based on Data Locality for Hadoop Distributed File System
Abstract— Replication plays an important role for storage system to improve data availability, throughput and response time for user and control storage cost. Due to different nature of data access pattern, data popularity is important in replication because of the unstable and unpredictable nature of popular files. Also, replicas placement is important in consideration of system's performance. In data-parallel applications, data locality is a key issue and this consequence of this issue occurs the decrement of system’ performance. Therefore, this paper proposes a data locality-based replication for Hadoop Distributed File System (HDFS). In replica allocation, data popularity is considered for maintaining less replicas for unpopular data and also, disk bandwidth, CPU utilization and disk utilization are considered in the proposed replica placement algorithm in order to get better data locality and more effective storage utilization. Our proposed scheme will be effective for HDFS.
Index Terms— Replication, Data Locality, Data Popularity.
May Phyo Thu, Khine Moe Nwe, Kyar Nyo Aye
University of Computer Studies, MYANMAR
Cite: May Phyo Thu, Khine Moe Nwe, Kyar Nyo Aye, "Replication Based on Data Locality for Hadoop Distributed File System," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 663-667, Hong Kong, 15-17 June, 2019.