Friday, November 16, 2007

Massive Reuse Within the Open Source Community

A few people have commented on how I actually account for the widespread code reuse in open source when estimating newly created open source code vs. reused open source code, so it seems like a good idea to fill in some more details.

Not surprisingly, the open source community is excellent at reusing code! Traditional estimates of code reuse from papers like On Finding Duplication and Near-Duplication in Large Software Systems back in 1996 puts the code reuse in the 10% to 15% range. This has changed, and more recently in Large-scale code reuse in open source software, Audris Mockus from Avaya Labs examined Linux and BSD distributions and found that more than 50% of the files were used in more than one project. In addition, he writes "The most widely reused components were small and represented templates requiring major and minor modifications and a group of files reused without any change. Some widely reused components involved hundreds of files."

From an analysis of Black Duck's database of open source code, I have actually found that only 39% of the source files are unique -- in other words 61% are reused from either the same or other open source projects. Sure, this is not exactly comparing apples to apples. The other analysis pick specific sets of applications or operating system distributions, whereas I look across more than 150,000 open source projects. Some of these projects incorporate another project whole-sale, some start by cloning a project (effectively creating a branch), whereas others simply use a basic make system and a few other files to get started. Whichever way it starts, the development continues from there.

In addition to the source code reuse, there is also significant reuse of unmodified binary components. Just as for reused source code these can be complete projects, a complete component within a project, or just a few files.

Both source code reuse and binary reuse are captured in the following graph showing some of the most reused open source projects:



This graph was extracted from the Black Duck whitepaper The Quest for an "Open Source Genome" and shows how many times files from popular open source projects have been reused in other open source projects. It clearly shows that files from some open source projects are reused in thousands of other open source projects. The actual number of open source projects reused in more than 1000 other open source projects is actually 46, clearly showing that the open source community really is serious about reusing code!!



How much Open Source do you (re)use?









 




No comments: