Naïve Bayes SentimentAnalysis with Fixed and Variable Length Classes Training Data Sets
Abstract— The tremendous development in technology has led to the increasing number of people that join social networks to share information, opinion and so on. With these developments, the social networks are big targets and easy place to capture many people’s opinions about certain things. A lot of works have been done by many researchers on the extraction of sentiments from various data sources. Different works employed differenttechniques and approaches. This particular work investigates the training dataset input. We categorized the training datasets into two (2) and termed them; the Variable Length/size (VS) training dataset and the Fixed Lengths/size (FS) training datasets. In the FS, we took the number of positivedocuments equals the number of negative documents. In the VS, we took the number of positivedocuments greater than the number of negative documents (VS positive) and vice versa (VS negative). Binary Naïve Bayes algorithm was used to test the FS, VS positive and VS negative training datasets on the test dataset. The results showed that, it is better to use the FS training dataset, and if the numbers of positive and negative texts are going to be unequal, then the ratio number of one class to the other should be very small. We can conclusively say that, the wider the ratio the less accurate results, and the narrower the ratio, the more accurate the results.
Index Terms— Sentiment Analysis,Naïve Bayes’, Supervised Machine Learning, Data mining.
Saad Ibrahim Amaya, DongYuxin
Harbin Engineering University Harbin, CHINA
Cite: Saad Ibrahim Amaya, DongYuxin, "Naïve Bayes SentimentAnalysis with Fixed and Variable Length Classes Training Data Sets," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 274-279, Hong Kong, 15-17 June, 2019.