COVID-19 Fake News Detection Using Joint Doc2Vec and Text Features with PCA
Abstract:
With the current pandemic, it is imperative to stay up to date with the news and many sources contribute to this purpose. However, there is also misinformation and fake news that spreads within society. In this work, a machine learning approach to detect fake news related to COVID-19 is developed. Specifically, Doc2Vec language model is used to transform text documents into vector representations, and handcrafted features like document length, the proportion of personal pronouns, and punctuation are included as complementary features as well. Then, Principal Component Analysis (PCA) is performed on the original feature vectors to reduce dimensionality. Both, the original and reduced data are fed to various machine learning models and finally compared in terms of accuracy, precision, recall, and execution time. The results indicate that the reduced set of features had minimal accuracy impact. However, the execution times are greatly reduced in most cases, specifically at testing time, indicating that dimensionality reduction can be useful on projects already in production that would need model inference on large volumes of documents to detect fake news.
Año de publicación:
2022
Keywords:
- fake news
- Principal Component Analysis
- Text classification
Fuente:
Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Aprendizaje automático
- Ciencias de la computación
Áreas temáticas:
- Funcionamiento de bibliotecas y archivos
- Lengua
- Medicina y salud