On the proper use of the Pearson correlation coefficient: Definitions, properties and assumptions
Abstract:
The Pearson correlation coefficient is a measure widely used in several areas of scientific work, from technical, econometric or engineering studies; to social, behavioral or health sciences researches. It’s precisely this extensive and profuse disclosure one of the reasons that would explain the misuse of this statistical tool, especially in those cases in which it must be correctly interpreted, or in those situations in which the mathematical assumptions that support it have to be checked. An example of this arise when it’s assumed that correlation implies causation, confusion that occurs frequently and involves both, novice and experienced researches. But perhaps the mayor focus of errors is found when checking assumptions such as normality, since it’s verified only at univariate level omitting its bivariate verification, possibly due to lack of knowledge or because it requires more complex techniques. Similar situation is observed when trying to detect outliers. In this case, it’s common to use box and whisker plots to identify extreme values in each variable, when the appropriate would be to approach this task using procedures that calculate the distances that separates this observation of the center of the data, taking into account all its vector space components. In this regard, this review is proposed as a contribution to clarify these doubts and as a methodological guide to help in the verification of such assumptions, addressing the mathematical aspect in a general manner, but emphasizing the alternatives available to undertake this type of analysis.
Año de publicación:
2018
Keywords:
- Coefficient
- Bivariate normality
- Correlation
- assumptions
- Pearson
- Multivariate outliers
Fuente:
Tipo de documento:
Article
Estado:
Acceso restringido
Áreas de conocimiento:
- Estadísticas
- Estadísticas
Áreas temáticas:
- Principios generales de matemáticas
- Matemáticas
- Ciencias sociales