Towards a Normalized Java Resource (NJR) - A Workshop at ASE 2019

Prof. Joshua Garcia, Prof. Cristina Lopes, and Prof. Nenad MedvidovicProf. Cristina Lopes and UCLA Prof. Jens Palsberg have organized a series of workshops with the goal of establishing a Normalized Java Resource (NJR). The first workshop was held in 2017 at the SPLASH conference in Vancouver, Canada. The most recent was held in November at the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) in San Diego. This was the sixth workshop on the topic; two additional workshops are planned.

The goal of the NJR workshops is to create an infrastructure that consists of 100,000 executable Java programs together with a set of working tools and an environment for building new tools. Researchers can search for Java programs with desired characteristics and they can write scripts that base their new tool on our working tools. The researchers will receive the search result as a container that can run either locally or on a cloud service. Lopes and Palsberg envision enabling scalable processing by normalizing the representation of each Java program such that one can easily run a tool on the entire collection. Additionally, they will ensure diversity by running clone detection and removing near-duplicates. Their Normalized Java Resource (NJR) will lower the barrier to implementation of new tools, speed up research, and ultimately help advance research frontiers.

At the November workshop held at ASE, Prof. Nenad Medvidović gave a presentation titled “ARCADE - A Workbench for Mining Architectural Information and Identifying Technical Debt” in which he introduced ARCADE, an integrated collection of tools for isolating three types of architectural information from details readily available about a system’s implementation: architectural design decisions, change, and decay.

Prof. Joshua Garcia gave a presentation titled “Why We’re Going in SAIN: Producing a Community-Wide Software Architecture Infrastructure,” in which he described how a team of software-engineering and software-architecture researchers, in collaboration with nearly 50 researchers worldwide, are producing the Software Architecture INfrastructure (SAIN), a community-wide research infrastructure to support empirical research at the intersection of software maintenance and software architecture.

And Prof. Cristina Lopes presented her work on “50K-C: A Dataset of Compilable and Compiled, Java Projects.” 50K-C is a set of 50,000 Java projects collected from GitHub, with very small amounts of duplication that, along with the included dependencies, are guaranteed to compile and run. Prof. Lopes and her students are now working to make it more interactive, so that researchers can search for projects with specific properties such as using a specific API, having test cases, etc.

The workshop was successful in connecting the several Java-related research infrastructure platforms that participants are working on. Some common concerns arise from the rapid development that the Java language has been through recently. For better or for worse, up until Java 8, Java was very conservative in terms of new features. But starting with Java 9, new features such as modules, functional elements, and type inference have brought a lot of changes to the ecosystem, relatively quickly. Although not many projects rush to adopt the new language features, these recent changes will force our tools to have to adapt, too. Additional workshops will be held in 2020.

Recent NJR workshops are supported by NSF award # 1823227.

Are you interested in participating? Contact Prof. Lopes.

For more information, visit:


This article appeared in ISR Connector issue: