Meet Software Sleuth Prof. James A. Jones

Whereas many view software system complexity, immensity, and dynamism as formidable difficulties, ISR Professor James A. Jones views these traits with excitement.  “Software is a living artifact — constantly changing — full of complexities and intricacies that are fascinating to imagine” says Jones.  He puts such imagination to work through his research, which assists developers in understanding how programs are behaving — both correctly and incorrectly — for performing maintenance tasks and for finding and fixing bugs.

Prof. James A. JonesAccording to Jones, the fundamental artifact to study for software is its source code.  Unlike artifacts such as requirements, formal specifications, and design documents, the source code is nearly always present and available to its developers, regardless of the diligence, maturity, or conformity of the development organization.  Also, unlike those other artifacts that are often not kept up-to-date, source code is practically guaranteed to be an accurate and current representation of the actual functionality of the software.  Moreover, the source code contains rich information such as embedded documentation, in the form of source-code “comments” and identifier names.  

However, the aspects of completeness and up-to-dateness of the source code come at a cost: this artifact is complex, exhibits myriad behaviors when executed, usually contains bugs, and is ever-changing.  As Jones explains, “An instruction in the source code may set the value for variable ‘X’. ‘X’ may be referenced and used in ten other instructions. And that variable, ‘X,’ might also affect execution paths through five predicates (such as ‘if’ or ‘while’ instructions that check its value). So, in this example, the definition of ‘X’ directly affects 15 instructions, but each of those affected instructions in turn affects another set of statements, and so on. Altogether, the set of ways individual instructions can affect other instructions creates a large and complex web of influence.”  To further describe such complexities, Jones continues, “...think about the execution of that program — billions of individual instructions firing in the blink of an eye.”  Additionally, “...the source code is often modified daily, and those modifications change those instructions’ relationships in sometimes drastic ways, and other times, in subtle but insidiously destructive ways.” 

Jones attacks these challenges head-on by gathering all such artifacts, and then processing and analyzing them, using methods borrowed from statistics, compilers, and information-visualization research, to create models of the program and techniques that work on those models.  One such technique Jones created, called Tarantula, has become an influential and widely cited body of work for describing where bugs are located.  Tarantula works by observing program execution and performing correlation analyses to determine the precise instructions that correlate with program failure.  The analysis results are then displayed on a visualization of the program that allows a developer to find the bugs.

A recent example of such work includes techniques and large-scale program models developed by Jones and his research group that enable developers to view and explore the source code. Specifically, the models describe webs of relationships between instructions in programs, and exhibit emergent behaviors of software source code.  From such models and techniques, developers can be “steered” through an exploration of the program for such tasks as understanding the root cause of bugs or understanding the preconditions that give rise to the manifestation of those bugs.  Additionally, functionalities in the software that cross-cut the structure of the program can be identified and mapped to assist developer comprehension or inform possible refactoring maintenance.

Another example includes the development of techniques for the exploration and comprehension of the evolution of a program.  Jones’s group developed models of fine-grained software evolution that track the lineage of each source-code instruction throughout the lifetime of the software system.  The goal is to enable developers to query and thus understand of how features of the software were changed, in a fast, easy way that might allow for patterns to be observed in the software evolution.  Such patterns can inform many software development and maintenance tasks. One such task that is currently being developed is the recommending of which developer has the most expertise for fixing a bug.

Jones’s research group — the Spider Lab — includes Ph.D. students Francisco ServantNicholas DiGiuseppe, and Fang Deng, M.S. student Vijay Krishna Palepu, and undergraduate student Theodore Suzukawa.  Jones’s research has been funded by grants from NSF, Google, Boeing, and Tata Consultancy Services. 

For more information on Jones and his research group, visit the Spider Lab website.

This article appeared in ISR Connector issue: