Data cleaning technique for security big data ecosystem


Abstract:

The information networks growth have given rise to an ever-multiplying number of security threats; it is the reason some information networks currently have incorporated a Computer Security Incident Response Team (CSIRT) responsible for monitoring all the events that occur in the network, especially those affecting data security. We can imagine thousands or even millions of events occurring every day and handling such amount of information requires a robust infrastructure. Commercially, there are many available solutions to process this kind of information, however, they are either expensive, or cannot cope with such volume. Furthermore, and most importantly, security information is by nature confidential and sensitive thus, companies should opt to process it internally. Taking as case study a university's CSIRT responsible for 10,000 users, we propose a security Big Data ecosystem to process a high data volume and guarantee the confidentiality. It was noted during implementation that one of the first challenges was the cleaning phase after data extraction, where it was observed that some data could be safely ignored without affecting result's quality, and thus reducing storage size requirements. For this cleaning phase, we propose an intuitive technique and a comparative proposal based on the Fellegi-Sunter theory.

Año de publicación:

2017

Keywords:

  • Ecosystem
  • BIG DATA
  • DATA
  • cleaning
  • security

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Big data
  • Ciencias de la computación

Áreas temáticas:

  • Ciencias de la computación