Data Lake Management for Educational Analysis
Abstract:
This article presents an approach to managing an educational analytical system in a data lake. This solution covers higher education institutions' requirements for managing large volumes generated by their students and teachers. This work deals with the problem of the lack of organization when implementing a data lake due to the fact that there are no well-known or standardized methods for its administration. Our methodology proposes dividing the data lake into three zones: (1) landing tier, (2) staging tier, and (3) consumption tier, and transforming the data for each zone under the guidance of the Common Data Model and One Data Model. The main goal is to avoid the educational data lake from converting into a data swamp. This methodology was implemented at University as a case study over an open-source data lake environment. The results obtained figures that historical data analysis barriers are overcome thanks to the high capabilities of the data lake. In addition, this approach can be applied to other institutions with great flexibility, with commodity solutions, and regardless of the source data format.
Año de publicación:
2022
Keywords:
- one data model
- Education
- data lake
- HDFS
- BIG DATA
- common data model
Fuente:
Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Análisis de datos
- Ciencias de la computación
Áreas temáticas:
- Funcionamiento de bibliotecas y archivos