An Evaluation of Machine Learning Approaches to Integrate Historical Farm Data
Abstract:
Large datasets in agriculture are increasingly available through yearly surveys. However, very few longitudinal datasets providing insights for farmer’s decision are available. The main objective in this research is to match farm establishments. The purposes of this investigation is two fold: first, to match successive yearly surveys, producing longitudinal records into farm history; and second to use only categorical and numerical features to match records. We analyzed Ecuadorian national agricultural surveys from the years 2010 to 2012. In total, 125098 records were compared, using 16 different algorithms. Our results suggest that with this particular data setup, unsupervised methods using a stochastic matching approach outperform other algorithms in terms of F1 scores. Matching individuals over three consecutive years shows that ensemble techniques allowed the re-identification of 60% of individuals. In the context of Ecuador, no data are available to follow individual farms over time, longitudinal datasets could provide essential insights for local policies.
Año de publicación:
2022
Keywords:
- Entity Resolution
- Data integration
- Farm Matching
- Data matching
- Machine learning
- Record Linkage
Fuente:


Tipo de documento:
Article
Estado:
Acceso abierto
Áreas de conocimiento:
- Aprendizaje automático
Áreas temáticas:
- Ciencias de la computación