Regresar

An Unsupervised Learning Approach for Automatically to Categorize Potential Suicide Messages in Social Media

Abstract:

In this paper, we present an approach to categorize potential suicide messages in social media which is based on unsupervised learning. Our approach has five phases: the first two correspond to data acquisition and pre-processing where texts available in a corpus for suicide detection were taken and converted into a structured format; in the third phase, similarity between texts are computed using semantic similarity measures; traditional clustering algorithms were used to identify categories of potential suicide messages in the fourth phase; and, in last phase, using validation metrics we verified the usefulness of our approach to replicate the allocation of text into categories as in the original corpus data. Computational results showed that our approach is able to replicate the grouping of messages labeled as 'No risk' and 'Risk' in average rates of 79 % and 87 % and rates up 13 % and 9 % in alert levels for English and Spanish, respectively.