NJR: A Normalized Java Resource

Project Dates: 
March 2019
Research Area(s): 

This project has been funded by the National Science Foundation.

Project Description: 

We are on the cusp of a major opportunity: software tools that take advantage of Big Code. Specifically, Big Code will enable novel tools in areas such as security enhancers, bug finders, and code synthesizers. What do researchers need from Big Code to make progress on their tools? Our answer is an infrastructure that consists of 100,000 executable Java programs together with a set of working tools and an environment for building new tools. This Normalized Java Resource (NJR) will lower the barrier to implementation of new tools, speed up research, and ultimately help advance research frontiers.

Researchers get significant advantages from using NJR. They can write scripts that base their new tool on NJR’s already-working tools, and they can search NJR for programs with desired characteristics. They will receive the search result as a container that they can run either locally or on a cloud service. Additionally, they benefit from NJR’s normalized representation of each Java program, which enables scalable running of tools on the entire collection. Finally, they will find that NJR’s collection of programs is diverse because of our efforts to run clone detection and near-duplicate removal. In this paper we describe our vision for NJR and our current prototype.