A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data
Abstract:
Generation of massive data in the digital age leads to possible violations of individual privacy. The search for personal data becomes an increasingly recurrent exposure today. The present work corresponds to the area of differential privacy, which guarantees data confidentiality and robustness against invasive identification attacks. This area stands out in the literature for its rigorous mathematical basis capable of quantifying the loss of privacy. A differentially private method based on regression models was developed to prevent inversion attacks while retaining model efficacy. In this paper, we propose a novel approach to improve the data privacy based on regression models under heteroscedasticity, a common aspect, but not studied, in practical situations of differential privacy. The influence of privacy restriction on the statistical performance of the estimators of model parameters is evaluated using Monte Carlo simulations, including a study of performance associated with test rejection rates for the proposed approach. The results of the numerical evaluation show high inferential distortion for stricter privacy restrictions. Empirical illustrations with real-world data are presented to show potential applications.
Año de publicación:
2023
Keywords:
- Confidentiality
- Linear and logistic regressions
- Monte carlo simulation
- anonymity
- Perturbations of data
- Data breach and fitting
- Statistical consistency and modeling
Fuente:
![scopus](/_next/image?url=%2Fscopus.png&w=128&q=75)
![google](/_next/image?url=%2Fgoogle.png&w=128&q=75)
Tipo de documento:
Article
Estado:
Acceso restringido
Áreas de conocimiento:
- Aprendizaje automático
- Ciencias de la computación
- Estadísticas
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Tecnología (Ciencias aplicadas)
- Colecciones de estadísticas generales