Using reddit data for multi-label text classification of twitter users interests


Abstract:

The automation process for inferring users' interest groups is a challenge task in social networks research and it has applications in marketing and recommendation systems. Manually labeling of documents is a difficult and an expensive task, but it is essential for training an automatic text classifier. Actually, there are several approaches where the problem is treated as a multi-label pbkp_rediction task. In this work, a methodology is proposed to automatically categorize data by considering Reddit and Twitter data. First, a dataset of 42.100 publications belongs to popular forums site Reddit is collected to train a model with labeled data. Then, a dataset of tweets, an average of 100 tweets per user, from 1573 profiles is collected to pbkp_redict users' topics of interest with the trained model. Finally, we were able to automatically categorize data with an average precision of 75.62%.

Año de publicación:

2019

Keywords:

  • TD-IDF
  • classification
  • Lda
  • TEXT
  • Word2Vec
  • Twitter
  • Reddit

Fuente:

googlegoogle
scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Ciencias de la computación

Áreas temáticas: