Mining Web Content Outliers for Improving the Quality of Search Results by using Mathematical Approaches

WCSE 2019 SPRING ISBN: 978-981-14-1455-8
DOI: 10.18178/wcse.2019.03.026

Thinzar Tun, Khin Mo Mo Tun

Abstract— The main task of Web mining is to provide users for ret rieving relevant information from the web effectively and efficiently. The unnecessary irrelevant duplicated web pages on searching informat ion from web affect the low quality of search results and increase indexing space and time complexity. It becomes a challenging task to provide high quality and effective search result to retrieve information. Web content outlier mining focus on mining outliers such as irrelevant and redundant pages from other the web pages under the same categories. A mathemat ical approach, Statistical Correlation Coefficient Approach with Term Frequency Inverse Document Frequency (TF.IDF) technique and domain dictionary is used to remove the irrelevant documents. And Kendall's Tau rank correlation coefficient is used to remove the redundant web documents and to retrieve ranked unique web documents. The results from proposed method gives F1-measures and accuracy higher than existing methods.

Index Terms— web content outliers, TF.IDF, Statistical correlation coefficient, Kendall's Tau rank correlation

Thinzar Tun
University of Information Technology, MYANMAR
Khin Mo Mo Tun
Faculty of Computing Department, University of Information Technology, MYANMAR

[Download]

Cite: Thinzar Tun, Khin Mo Mo Tun, "Mining Web Content Outliers for Improving the Quality of Search Results by using Mathematical Approaches," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering WCSE_2019_SPRING, pp. 154-158, Yangon, Myanmar, February 27-March 1, 2019.

PREVIOUS PAPER
Predictive Analytics on High-Dimensional Big Data using Principal Component Regression (PCR)

NEXT PAPER
Cloud Based Teacher’s Assessment Data in Educational Data Mining