Sentiment analysis and review ranking using Naïve Bayes with weights in online social networks

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Brandon Joyce (Creator)
Institution
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/
Advisor
Jing Deng

Abstract: Online reviews are critical in many aspects, for business as well as customers. Yet the accuracy and trustworthiness of these reviews are usually unsubstantiated and little research has been performed to investigate them. For the 2016 US Presidential election, many people expressed their likes or dislikes for a particular presidential candidate. Our aim was to calculate the sentiment expressed by these tweets, and then compare this sentiment with polling data to see how much correlation they share. We used a lexicon and Naive Bayes Machine Learning Algorithm to calculate the sentiment of political tweets collected one-hundred days before the election. We used manually labeled tweets as well as automatically labeled tweets based on hashtag content/topic. Our results suggest that Twitter is becoming a more reliable platform in comparison to previous work. Furthermore, we use a set of Yelp reviews on various topics (food, hotel, etc.) as an example to perform sentiment analysis and investigate the correlation between review comment sentiment and its numeric rating. We used feature selection techniques to statistically remove redundant words from reviews, thus improving run time and accuracy. Our method gives higher weight to those terms/words appearing in reviews with more useful votes. These techniques combined with Naive Bayes approach achieves an overall accuracy of 75%. More interestingly, our method is shown to perform well in 1-star and 5-star reviews, with 92% accuracy for the latter. With such a strong accuracy, we argue that the proposed sentiment analysis technique can be used to shed light on all online comments, especially those without numerical ratings.

Additional Information

Publication
Thesis
Language: English
Date: 2019
Keywords
Machine Learning, Naive Bayes Classifier, Natural Language Processing, Sentiment Analysis, Social Media, Social Networks
Subjects
Machine learning
Computational linguistics
Public opinion $x Data processing
Online social networks $x Data processing

Email this document to