Regresar

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation

Abstract:

We investigate the impact of popularity bias in false-positive metrics in the offline evaluation of recommender systems. Unlike their true-positive complements, false-positive metrics reward systems that minimize recommendations disliked by users. Our analysis is, to the best of our knowledge, the first to show that false-positive metrics tend to penalise popular items, the opposite behavior of true-positive metrics - causing a disagreement trend between both types of metrics in the presence of popularity biases. We present a theoretical analysis of the metrics that identifies the reason that the metrics disagree and determines rare situations where the metrics might agree - the key to the situation lies in the relationship between popularity and relevance distributions, in terms of their agreement and steepness - two fundamental concepts we formalize. We then examine three well-known datasets using multiple popular true- and false-positive metrics on 16 recommendation algorithms. Specific datasets are chosen to allow us to estimate both biased and unbiased metric values. The results of the empirical study confirm and illustrate our analytical findings. With the conditions of the disagreement of the two types of metrics established, we then determine under which circumstances true-positive or false-positive metrics should be used by researchers of offline evaluation in recommender systems.1