Regresar

Evaluation of a black-box estimation tool: A case study

Abstract:

For the past 30 years, various estimation models and tools have been developed to help managers perform estimation tasks. Some of these estimation tools date from the late 1970s, and have been progressively modernized by their vendors in terms of tools' user interfaces and new functions to facilitate not only project estimation but also detailed project planning. For organizations interested in using such estimation tools, it is crucial to know about their predictive performance. However, it is not an industry practice for the vendors to document the performance of these commercial estimation tools; estimation tool builders have not provided information on the performance of their models with respect to their own initial data repositories, nor on subsequent versions. Basically, such estimation tools are often black boxes with undocumented performance properties. Various researchers have attempted to analyze the performance of such black-box estimation tools within the constraints of research data sets that were fairly small compared to the larger ones as claimed by tool vendors. The research presented here revisits this issue, this time with a much larger data set from the International Software Benchmarking Standards Group (ISBSG). This new study is presented in three steps. First, the data set is analyzed by the programming language and corresponding subsamples are identified, including identification of obvious outliers with respect to effort and size. Second, estimation models are built directly from such samples, in a white-box fashion, with and without outliers. Third, a commercial software estimation tool widely distributed throughout the world is tested against the same set of samples. In summary, for the majority of samples available, the black-box tool fares fairly poorly. Lessons learned are of two types: prospective tool users should demand that tool vendors benchmark their black-box tool against publicly available repositories; and the interval of confidence of the output provided by their tool, as well as the basis for such output (e.g. in terms of both the number of observations and currentness of such data), must be documented. Copyright © 2007 John Wiley & Sons, Ltd.