Enriching an online suicidal dataset with active machine learning

UNCW Author/Contributor (non-UNCW co-authors, if there are any, appear on document)
Yang Song (Creator)
Yiyi Cici Yang (Creator)
The University of North Carolina Wilmington (UNCW )
Web Site: http://library.uncw.edu/

Abstract: The scarce and often small size of annotated suicide-related datasets is one of the main obstacles toward automating the process of identifying the online users of high suicide risk. In this paper, we present a framework to annotate a mental-health-related textual dataset with suicide attempts and suicide ideations in the posts and comments using active machine learning method. This approach starts from a relatively small annotated dataset, and learns from a domain expert by obtaining the expert's judgments on the most contradictory samples when the active machine learning model is not able to make the judgments. Meanwhile, the model annotates new samples without asking for the expert's input when it is confident enough about the new samples. The active machine learning models were evaluated and updated when a batch of new samples was annotated, including parameter tuning, replacing models, or changing the representations of samples. The dataset we used is from the SuicideWatch Reddit channel. We expanded the dataset from initially 200 manually annotated samples to 1000 ones.

Additional Information

Language: English
Date: 2022
Active Machine Learning, Suicide Prevention, Suicidal Ideation, Suicide Attempts

Email this document to