2017  |  2016  |  2015  |  2014  |  2013  |  2012  |  2011  |  2010  |  2009  |  2008  |  2007  |  2006  |  2005  |  2004  |  2003  |  2002  |  2001  |  2000

Clone Detection Research

Given the availability of large-scale source-code repositories, there have been a large number of applications for clone detection. Unfortunately, despite a decade of active research, there is a marked lack in clone detectors that scale to large software repositories. In particular for detecting near-miss clones where significant editing activities may take place in the cloned code.

Project Dates: 
January 2014

We developed a token-based approach for large scale code clone detection which is based on a filtering heuristic that reduces the number of token comparisons when the two code blocks are compared. We also developed a MapReduce based parallel algorithm that uses the filtering heuristic and scales to thousands of projects. The filtering heuristic is generic and can also be used in conjunction with other token-based approaches. In that context, we demonstrated how it can increase the retrieval speed and decrease the memory usage of the index-based approaches.

Project Dates: 
July 2011 to January 2014