Tracking Major Sources of Water Contamination Using Machine Learning


Abstract:

Current microbial source tracking techniques that rely on grab samples analyzed by individual endpoint assays are inadequate to explain microbial sources across space and time. Modeling and pbkp_redicting host sources of microbial contamination could add a useful tool for watershed management. In this study, we tested and evaluated machine learning models to pbkp_redict the major sources of microbial contamination in a watershed. We examined the relationship between microbial sources, land cover, weather, and hydrologic variables in a watershed in Northern California, United States. Six models, including K-nearest neighbors (KNN), Naïve Bayes, Support vector machine (SVM), simple neural network (NN), Random Forest, and XGBoost, were built to pbkp_redict major microbial sources using land cover, weather and hydrologic variables. The results showed that these models successfully pbkp_redicted microbial sources classified into two categories (human and non-human), with the average accuracy ranging from 69% (Naïve Bayes) to 88% (XGBoost). The area under curve (AUC) of the receiver operating characteristic (ROC) illustrated XGBoost had the best performance (average AUC = 0.88), followed by Random Forest (average AUC = 0.84), and KNN (average AUC = 0.74). The importance index obtained from Random Forest indicated that precipitation and temperature were the two most important factors to pbkp_redict the dominant microbial source. These results suggest that machine learning models, particularly XGBoost, can pbkp_redict the dominant sources of microbial contamination based on the relationship of microbial contaminants with daily weather and land cover, providing a powerful tool to understand microbial sources in water.

Año de publicación:

2020

Keywords:

  • Machine learning
  • XGBoost
  • Land use
  • Microbial source tracking
  • Fecal contamination
  • rainfall

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso abierto

Áreas de conocimiento:

  • Recursos hídricos
  • Aprendizaje automático
  • Ciencias de la computación

Áreas temáticas:

  • Otros problemas y servicios sociales