Friday, December 21, 2007

Experience with Open Source in Korea

In contrast to my experience in Hong Kong, the next stop in Korea presented a much more familiar view.

Not just the snow could have been in Boston, but also the discussions about using decision support processes to incorporate open source software in an effective and controlled manner. It was in fact very similar to many of the discussions I have had in the U.S. and in Europe.

However, the food and entertainment was at a completely different level than in Boston, but then again, I would expect nothing less from my previous experiences with Koreans!

Is it good for Korea to be like U.S and Europe

Wednesday, December 12, 2007

Caring about AGPL - Open Source in Hong Kong

Just arrived in Hong Kong Tuesday night and besides being an amazing city, I had discussions here which triggered some thoughts.

In the US and Europe, it has been relatively seldom that an open source discussion has been centered around AGPL, but it Hong Kong, AGPL turns out to be a really hot topic. The number of "Rolex" watches you get offered when walking down the street in Hong Kong seems to justify why we often think of China and Hong Kong as caring little about intellectual property. However, Hong Kong has changed a lot over the last decade as noted by U.S. Consul General, Hong Kong, James B. Cunningham, and large international Hong Kong based companies appear to be at least as concerned with intellectual property as any large company in the West. In addition to that, the Western companies may actually be the one focused on yesterday's problem and not staying on top of intellectual property issues. The attention to AGPL and its implications is certainly an indicator of that.

Who understands software IP in the 21st century

Monday, December 10, 2007

Keeping Busy with GPL

BusyBox is upping the ante on GPL compliance with their new lawsuit against Verizon stating that Verizon is infringing on the GPL (here is the actual filing) when shipping their FiOS routers. For those of you who are not familiar with Verizon, Verizon is one of the biggest telecommunication provider in the United States and FiOS is their version of fiber-optic Internet all the way to the end-user.

Companies like Verizon should really learn not to ignore requests for open source license compliance (which allegedly they did). I suspect the direct and indirect costs they are going to incur for dealing with this publicity is much higher than the costs of any effort to comply with the GPL license. It appears that simple listing companies on a Hall of Shame, like BusyBox used to do, is not the way open source developers are going to enforce licenses going forward.

Should open source developers be:


Thursday, December 6, 2007

Is Venture Money Needed? The Changing Cost of Starting a Business

On a recent plane trip, a colleague of mine brought up something interesting. While attending conferences where speakers have presented or demonstrated open source products, he decided to observe the audience reaction just as much as the presentation itself.

For presentations of a non-commercial open source project, there is a point at which the audience realizes that the product is completely free - and their reaction is one of intriguing amazement. It is a long process, but people are slowly realizing how open source completely changes the cost of IT infrastructure, as in the case of Genuitec, who was able to avoid VC backing by keeping development costs down with open source. This was recently summarized by a former CIO, who said “You can start an emerging technology company for little money - $100k instead of $5M in the '90s - by using open source for most of the technology needs.

Just to remove any confusion, I am not talking about companies developing open source products, but mainly about utilizing open source in every day business as illustrated in "Open Source is a pillar of strength for all startups to build on."

In addition to savings for technology companies, simply using open source for basic desktop applications can generate initial savings of $10,000 for a company with just 10 employees. Then add the use of open source for standard business IT infrastructure, e.g. a directory server (OpenLDAP), file servers (Linux/Samba), a phone system (Asterisk), an email and collaboration server (Zimbra), a database server (MySQL/PostgreSQL), a CRM system (SugarCRM) and a Content Management System (Drupal/Joomla), and the savings can multiply.

I know that many arguments can be made around Total Cost of Ownership (TCO) calculations due to things like ongoing support and expertise, but we will not go into that discussion now - I will, however, leave you with a link to this TCO Tool. Also note another new development over the last couple of years, where many of these applications available as SaaS offerings - again eliminating many initial IT costs.

It's gone relatively unnoticed, but many small businesses are catching on, as numbers from the 2007 Desktop Linux Survey show. According to the survey, 69.5% of Linux desktops are deployed in home offices and small business settings (defined as up to 100 desktops) as opposed to in medium and large organizations.

The mainstream appears to be discovering what has been obvious to the few of us in the open source industry for a while. It is apparent that many businesses can be started with much smaller amount of capital – which in many cases can eliminate the need for the traditional pre-revenue Series A round. However, I believe that it does not eliminate the need for significant investment (and the network some investors bring to the table) in order to expand the business and build out sales channels, but today companies can get further in their early stages of life before looking for investment, and this could change the dynamic of an initial investment round.

Is it easier or harder to start a technology business in 2007 than 10 years ago?


Wednesday, November 21, 2007

Is AGPL (Affero GPL) the Doom of Google?

This is actually a question I was asked today!

I am referring to the GNU Affero General Public License Version 3 (AGPL), which was just released Monday. AGPL extends GPL to give end-users access to the source code for AGPL licensed software accessed over a network. As the Free Software Foundation (FSF) says in their press release, "The FSF recommends that people consider using the GNU AGPL for any software which will commonly be run over a network".

Well, a very significant portion of (if not most) software developed today will be accessed over a network - so if a large group of developers follows FSF's advice, AGPL could become a very widely used license. Normally, a new license would not be easily adopted by an open source project; since it would limit which other open source code could be reused within the project. However, the special situation here is that the AGPL is compatible with the GPL because of a special provision for such compatibility in Version 3 of both the AGPL and the GPL.

Now, if we take a look at the distribution of open source licenses among open source projects, we can see how many of these projects are under GPL compatible licenses (see FSF's Licenses Page for details on compatibility). Since the majority of GPLv2 projects are re-licensable under GPLv3, we end up with 90-95% of open source projects compatible with Version 3 of the GPL - and thereby also compatible with the new AGPL. For an illustration of how licenses can be combined, see David Wheeler's The Free-Libre / Open Source Software (FLOSS) License Slide or the chart halfway down the FSF page A Quick Guide to GPLv3.

From the individual open source developers that I have talked to, most do not start an open source project with any particular "political" licensing agenda, but they commonly have a few simple goals:

1. Ability to use as much other open source software as possible
2. Get other developers to contribute, and
3. Don't let somebody "steal" the code.

Many developers think that the GPL covers these bases decently - which is why it has become a favorite for new projects among non-corporate developers. However, this could now be changing with the introduction of the AGPL, since a project under the AGPL can still include the same 90+% of open source code that a project under the GPL can include - and by using the AGPL, the developer can arguably get closer to goals 2 and 3 - especially for web enabled software applications. The author of the GPL, Eben Moglen, has already stated that in his opinion, "Google and Yahoo are morally obliged to share their GPL code", but software licensed under the GPL cannot force these companies to do so. However, with the introduction of the AGPL, open source developers now have a "tool" to force such sharing, and if the approach is adopted, we might see a serious move towards using AGPL for new projects started by non-corporate open source developers.

As we can see with the uptake on using GPLv3, any adoption takes time, and even if the AGPL becomes popular among open source developers, it could be a while before significant portions of software is only available under the AGPL, and we may not see the real effects for another couple of years. However, even a relatively limited adoption would require organizations developing web-sites to be more careful in tracking their code-bases. They need to do this in order to either avoid AGPL code or know which code they need to make available to their users.

A wide adoption of the AGPL would change a current standard practice for creating a web application, where the developers start with a few pieces of GPL software and then modify the software until it suits their needs. With AGPL software in the mix, a business decision would have to be made on whether to use AGPL software and make source code for modifications and additions available - or to avoid AGPL software and spend more time developing software which can be kept out of the hands of competitors and potential hackers.

Larger companies, e.g. Google and Yahoo, are actually among the best positioned to live in this new world. They can carefully evaluate the trade-offs on a case-by-case basis and can introduce processes to make sure that AGPL code does not sneak into places where it should not be. If we end up in a world where major new inventive software is only available under the AGPL, they might obviously face new competition, but this should be a manageable issue, and I have faith that Google and Yahoo will adapt.

What is the impact of AGPL?


Friday, November 16, 2007

Massive Reuse Within the Open Source Community

A few people have commented on how I actually account for the widespread code reuse in open source when estimating newly created open source code vs. reused open source code, so it seems like a good idea to fill in some more details.

Not surprisingly, the open source community is excellent at reusing code! Traditional estimates of code reuse from papers like On Finding Duplication and Near-Duplication in Large Software Systems back in 1996 puts the code reuse in the 10% to 15% range. This has changed, and more recently in Large-scale code reuse in open source software, Audris Mockus from Avaya Labs examined Linux and BSD distributions and found that more than 50% of the files were used in more than one project. In addition, he writes "The most widely reused components were small and represented templates requiring major and minor modifications and a group of files reused without any change. Some widely reused components involved hundreds of files."

From an analysis of Black Duck's database of open source code, I have actually found that only 39% of the source files are unique -- in other words 61% are reused from either the same or other open source projects. Sure, this is not exactly comparing apples to apples. The other analysis pick specific sets of applications or operating system distributions, whereas I look across more than 150,000 open source projects. Some of these projects incorporate another project whole-sale, some start by cloning a project (effectively creating a branch), whereas others simply use a basic make system and a few other files to get started. Whichever way it starts, the development continues from there.

In addition to the source code reuse, there is also significant reuse of unmodified binary components. Just as for reused source code these can be complete projects, a complete component within a project, or just a few files.

Both source code reuse and binary reuse are captured in the following graph showing some of the most reused open source projects:

This graph was extracted from the Black Duck whitepaper The Quest for an "Open Source Genome" and shows how many times files from popular open source projects have been reused in other open source projects. It clearly shows that files from some open source projects are reused in thousands of other open source projects. The actual number of open source projects reused in more than 1000 other open source projects is actually 46, clearly showing that the open source community really is serious about reusing code!!

How much Open Source do you (re)use?


Friday, November 9, 2007

The Open Source Community as a Top 100 Country

For months Black Duck Software CEO Doug Levin has been writing a blog. It is interesting and offers useful insights into the open source community, software development and other things. After a series of inquiries, I slowly came to the conclusion that this was a good way to share my point of view as well. So I am writing this blog.

As you may know, Black Duck Software maintains a database of all the open source code that we know of, and this database gets updated continuously day in and day out.

This morning I took a look at the amount of open source code that we receive every day. I decided to only look at new unique source files found in actual project releases - ruling out non source code - e.g. documentation and binaries, interim code - which is not officially released, and duplicates of existing files that developers reuse from the same or other open source projects. Even so, approximately 4.7 million lines of code is added everyday - which translates into 1.7 billion lines of code each year. Although we are probably missing some parts of open source out there - which would make this an underestimate, we can take a leap faith and use this as a proxy for the amount of open source code created in the world, and then we can get some idea of the value created by the open source community.

Now let's make a bunch of assumptions and try to see the value of the effort in creating such an amount of code. Assuming an average open source project is 35,000 lines of code and the average cost of a software developer is $30/hour (~$60,000/year), a simple COCOMO II calculator tells us that the average open source project costs $630,000 to develop. This cost translates into $18 per line of code. Extrapolating that to 1.7 billion lines of code gives us an estimated value of $30.6 billion/year. Changing perspective for a second, if the open source community was a country with a GDP of $30.6 billion, it would rank 77 right between Bulgaria and Lithuania according to the International Monetary Fund's list of GDP by country, thereby putting the open source community ahead of most countries in the world.

You can argue about whether this number is high or low, and you can argue whether the basic COCOMO calculation on a 35,000 line project can be extrapolated. However, the $30 billion/year number seems consistent with previous estimates such as David A. Wheeler's More Than a Gigabuck: Estimating GNU/Linux's Size estimating the cost to develop all elements of the Red Hat Linux 7.1 distribution as $1.08 billion, and the study: Economic impact of open source software on innovation and the competitiveness of the Information and Communication Technologies (ICT) sector in the EU estimating the cost to develop the elements of the Debian 3.1 distribution (until 2005) at €11.9 billion -- increasing to a cumulative €100 billion (~$146 billion) by 2010.

According to these rough calculations, the direct economic impact (ignoring any indirect economic impact) of the open source community appears to be larger than the economic impact of most individual countries in the world. Even if the numbers could be somewhat off and not a perfect measurement of impact, it does show that the development cost of open source is in that same order of magnitude as many countries' GDP. Such an economic force should not be underestimated, and this is yet another indication that open source has become a significant part the technology world.