Evaluating Pre-trained Word Embeddings and Neural Network Architectures for Sentiment Analysis in Spanish Financial Tweets


Abstract:

Sentiment Analysis supports decision making in the financial domain by gaining rapid insights regarding brand image as well as the perceived quality of their services and products in the market. The state of the art regarding Sentiment Analysis consists of using pre-trained word embeddings from large unannotated corpora in order to capture rich and meaningful properties of the words together with their semantic relationships. These rich semantic representations are used to feed a neural network in order to learn to distinguish among positive, negative or neutral texts. However, although pre-trained word embeddings have been applied to different domains and languages, as far as our knowledge goes, there are no studies regarding their reliability applied to the financial domain in Spanish. Consequently, we compiled and labelled a corpus composed of 7,435 tweets from economists and financial news sites and we evaluated the performance of different pre-trained word embeddings, some well-known neural network architectures and linguistic features. Our results indicate that the fastText model, trained with the Spanish Unannoted Corpora and in conjunction with linguistic features, achieved the best accuracy of 58.036% using a Gated Recurrent Unit. As an extra contribution, the compiled corpus was released to the scientific community.

Año de publicación:

2020

Keywords:

  • deep learning
  • sentiment analysis
  • Word embeddings

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Idioma

Áreas temáticas:

  • Programación informática, programas, datos, seguridad
  • Lengua