Mathematically optimized, recursive prepartitioning strategies for k-anonymous microaggregation of large-scale datasets


Abstract:

The technical contents of this work fall within the statistical disclosure control (SDC) field, which concerns the postprocessing of the demographic portion of the statistical results of surveys containing sensitive personal information, in order to effectively safeguard the anonymity of the participating respondents. A widely known technique to solve the problem of protecting the privacy of the respondents involved beyond the mere suppression of their identifiers is the k-anonymous microaggregation. Unfortunately, most microaggregation algorithms that produce competitively low levels of distortions exhibit a superlinear running time, typically scaling with the square of the number of records in the dataset. This work proposes and analyzes an optimized prepartitioning strategy to reduce significantly the running time for the k-anonymous microaggregation algorithm operating on large datasets, with mild loss in data utility with respect to that of MDAV, the underlying method. The optimization strategy is based on prepartitioning a dataset recursively until the desired k-anonymity parameter is achieved. Traditional microaggregation algorithms have quadratic computational complexity in the form Θ(n2). By using the proposed method and fixing the number of recurrent prepartitions we obtain subquadratic complexity in the form Θ(n3/2), Θ(n4/3), depending on the number of prepartitions. Alternatively, fixing the ratio between the size of the microcell and the macrocell on each prepartition, quasilinear complexity in the form Θ(nlog n) is achieved. Our method is readily applicable to large-scale datasets with numerical demographic attributes.

Año de publicación:

2020

Keywords:

  • Optimized prepartitioning
  • Microaggregation
  • Data privacy
  • Large-scale datasets
  • Statistical disclosure control
  • k-anonymity

Fuente:

scopusscopus
googlegoogle

Tipo de documento:

Article

Estado:

Acceso restringido

Áreas de conocimiento:

  • Ciencias de la computación
  • Optimización matemática
  • Optimización matemática

Áreas temáticas:

  • Ciencias de la computación