Decision tree method for the classification of chemical pollutants: Incorporation of across-chemical variability and within-chemical uncertainty


Abstract:

We have developed a decision tree methodology for the classification of chemicals by estimates of potential human exposure. The steps involved in the construction of a decision tree are as follows. Monte Carlo simulations are conducted by randomly sampling chemical and environmental properties, whose range of values represents the variability of parameters across a defined set of chemicals and environmental conditions. The tree structure is then defined by a series of constraints placed on the various chemical and environmental properties using the Classification and Regression Tree Algorithm (CART). Each node of the tree is associated with a human exposure value and is considered a bin, which classifies chemicals whose properties are consistent with those parametric constraints associated with the particular node. In addition to being associated with parametric constraints, each bin or tree node is associated with a human exposure level. In this manner, the tree structure functions as a template from which a set of chemicals are classified into parametric regions that are associated with an exposure level. Three important properties of this classification approach are as follows: (a) The variability across this chemical set is described by the template. (b) Parameter correlations are described by assessing which bins are represented by at least one chemical. (c) The sensitivity of the classification is assessed using both the uncertainty of the values for a particular chemical and any uncertainty or variability associated with site- specific exposure and environmental properties. To illustrate these properties, a case study was conducted in which exposures were estimated using the multimedia exposure model CalTOX assuming a regional chemical release into soil. A decision tree template was constructed and then used to classify 79 chemicals. Analysis of the simulation outputs identified 4 out of 14 chemical properties whose value ranges played the dominant role in the classification of chemicals into exposure ranges (R2 = 0.78); i.e., 78% of the exposure variation seen in the data could be explained using only 4 of the 14 chemical properties that are known to affect chemical fate and transport. The most important classifier was the half-life in root-zone soil, τ(s). In addition, a sensitivity analysis of 93 site-specific environmental and exposure properties suggested that only four of these factors influenced the classification.

Año de publicación:

1998

Keywords:

    Fuente:

    scopusscopus

    Tipo de documento:

    Article

    Estado:

    Acceso restringido

    Áreas de conocimiento:

    • Química ambiental
    • Química ambiental
    • Aprendizaje automático

    Áreas temáticas:

    • Química física
    • Química analítica
    • Ecología