COMPARISON OF TOPIC MODELING METHODS FOR ANALYZING TWEETS ON COVID-19 VACCINE

ECU Author/Contributor (non-ECU co-authors, if there are any, appear on document)
Zeinab Khanjarinezhadjooneghani (Creator)
Institution
East Carolina University (ECU )
Web Site: http://www.ecu.edu/lib/

Abstract: Twitter is a microblogging site and a popular social media platform for sharing thoughts on current world events. The dynamic of Twitter discussions makes it a valuable data source for mining people's opinions and emotions towards world events. Tweets' dynamic nature can be used to analyze opinion shifting and sentiment shifting for specific targets. The COVID-19 outbreak is one of the recent worldwide events that affect people's lives worldwide in the last two years. Many people share their feelings and experiences through social media towards this pandemic. COVID-19-related tweets have recently been the subject of some research. This thesis also analyzes tweets related to the COVID-19 vaccine. The main objective of this thesis is to mine human concerns towards the COVID-19 vaccine using Twitter data. This thesis applies three topic modeling methods to discover the discussed subjects about the COVID-19 vaccine and analyze the topics' dynamic over a specific period. The models are Latent Dirichlet Allocation (LDA), LDA with Gibbs Sampling, Nonnegative Matrix Factorization (NMF), and Top2vec models. Furthermore, this thesis compares these three topic modeling methods based on human judgment, coherence value, and topics uniqueness. The results show both LDA outperformed NMF in terms of Jaccard score. In addition, LDA-Mallet outperformed LDA and NMF in terms of Coherence score. It is difficult to determine which one of NMF and LDA definitely provided the better score for some of the experiments. But, at all, it can be stated NMF performed better than LDA in terms of Coherence score. Top2Vec returned 255 topics for this case study, which is not desired for the purpose of this study. Three other methods outperform Top2vec in terms of Jaccard score and coherence value.

Additional Information

Publication
Thesis
Language: English
Date: 2023
Subjects
Topic modeling;Social media analysis

Email this document to

This item references:

TitleLocation & LinkType of Relationship
COMPARISON OF TOPIC MODELING METHODS FOR ANALYZING TWEETS ON COVID-19 VACCINEhttp://hdl.handle.net/10342/9414The described resource references, cites, or otherwise points to the related resource.