Using reddit data for multi-label text classification of twitter users interests
Abstract:
The automation process for inferring users' interest groups is a challenge task in social networks research and it has applications in marketing and recommendation systems. Manually labeling of documents is a difficult and an expensive task, but it is essential for training an automatic text classifier. Actually, there are several approaches where the problem is treated as a multi-label pbkp_rediction task. In this work, a methodology is proposed to automatically categorize data by considering Reddit and Twitter data. First, a dataset of 42.100 publications belongs to popular forums site Reddit is collected to train a model with labeled data. Then, a dataset of tweets, an average of 100 tweets per user, from 1573 profiles is collected to pbkp_redict users' topics of interest with the trained model. Finally, we were able to automatically categorize data with an average precision of 75.62%.
Año de publicación:
2019
Keywords:
- TD-IDF
- classification
- Lda
- TEXT
- Word2Vec
Fuente:


Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Aprendizaje automático
- Ciencias de la computación