Predicting Bugs by Analyzing Software History
Slides
Speaker: Jim Whitehead, UC Santa Cruz/ISR
Abstract:
Almost all software contains undiscovered bugs, ones that have not yet been exposed by testing or by users. Wouldn't it be nice if there was a way to know the location of these bugs? This talk presents two approaches for predicting the location of bugs. The bug cache contains 10% of the files in a software project. Through an analysis of the software's development history and the location of bugs, files are added and removed from the cache based on four notions of bug locality: temporal, spatial, changed-entity, and new-entity locality. After processing, files in the bug cache contain 73-95% of undiscovered bugs. To improve the localization of predicted bugs, the second prediction approach focuses on configuration management commit transactions. Using machine learning techniques (Bayesian Networks and Support Vector Machines), we classify commits as being likely to have a fault, or unlikely to have a fault. Multiple tradeoffs of accuracy, precision, and recall are possible; one interesting combination yields an average buggy precision of 0.95 and buggy recall of 0.70. Hence, it is possible for a configuration management system to inform a developer that they have created a bug in a file they just committed, and have this assertion be correct 95% of the time. 70% of the bugs in the project will be so identified. We end by posing a series of questions these results raise, including how developers might usefully take action based on this information. Work in this talk was performed in collaboration with Sunghun Kim and Shiv Shivaji.
Bio:
Jim Whitehead is an Associate Professor of Computer Science at the University of California, Santa Cruz. His research interests lie in the area of software evolution (bug prediction), automated software construction, and automatic generation of computer game levels. He has recently developed a new undergraduate degree program in computer games, the BS Computer Science: Computer Game Design. Jim received his Ph.D. in Information and Computer Science from UC Irvine, in 2000, under his advisor Richard N. Taylor.
|