Regresar

Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network

Abstract:

The present work uses Hadoop as the core processing in the Big Data environment. There are several open sources tools from the Hadoop ecosystem that facilitate the processing of enormous volumes of data. In this paper, we have worked with Apache Flume and Apache Hive tools for the study case of the 2017 presidential elections in Ecuador. The analysis of data generated from Twitter social network focuses mainly in the first round of balloting of Ecuador’s 2017 presidential election. These generated data have been obtained, stored, processed and analyzed to comply with the characteristics of the information that is considered Big Data. The selected tools have been evaluated in their architecture, installation, and use. Finally, the data have been evaluated under certain quality criteria or dimensions.