Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network


Abstract:

The present work uses Hadoop as the core processing in the Big Data environment. There are several open sources tools from the Hadoop ecosystem that facilitate the processing of enormous volumes of data. In this paper, we have worked with Apache Flume and Apache Hive tools for the study case of the 2017 presidential elections in Ecuador. The analysis of data generated from Twitter social network focuses mainly in the first round of balloting of Ecuador’s 2017 presidential election. These generated data have been obtained, stored, processed and analyzed to comply with the characteristics of the information that is considered Big Data. The selected tools have been evaluated in their architecture, installation, and use. Finally, the data have been evaluated under certain quality criteria or dimensions.

Año de publicación:

2019

Keywords:

  • Apache Flume
  • BIG DATA
  • Twitter

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Big data
  • Análisis de redes sociales
  • Ciencias de la computación

Áreas temáticas:

  • Métodos informáticos especiales
  • Economía
  • Funcionamiento de bibliotecas y archivos