This paper refines Ingo Molnar’s estimate of the development effortit would take to redevelop Linux kernel version 2.6.Molnar’s rough estimate found it would cost $176M (US) toredevelop the Linux kernel using traditional proprietary approaches.By using a more detailed cost model and much more information about theLinux kernel, I found that the effort would becloser to $612M (US)to redevelop the Linux kernel as it existed in 2004.A postscript lists some recalculations since then,showing that these values have grown.In any case, the Linux kernel is clearly worth far more than the $50,000offered in 2004.
On October 7, 2004, Jeff V. Merkey made thefollowing offer on the linux.kernel mailing list:
We offer to kernel.org the sum of $50,000.00 US for a one timelicense to the Linux Kernel Source for a single snapshot ofa single Linux version by release number. This offer must beaccepted by **ALL** copyright holders and this snapshot willsubsequently convert the GPL license into a BSD style licensefor the code.
Many respondents noted that this proposal was unworkable,because it required complete agreement by all copyright holders.Not only would such a process be lengthy, butmany copyright holders made it clear in various repliesthat they wouldnot agree to any such plan.Many Linux kerneldevelopers expect improved versions of their code to be continuouslyavailable to them, and a release using a BSD-style license wouldviolate those developers’ expectations.Indeed, it was clear that many respondants felt that such a movewould strip the Linux kernel of legal protectionsagainst someone who wanted to monopolize a derived version of the kernel.Many open source software / Free software (OSS/FS)developers allow conversion of their OSS/FS programsto a proprietary program; some even encourage it.The BSD-style licenses are specifically designed to allow conversionof an OSS/FS program into a proprietary program.However,theGPL is themost popular OSS/FS license, and it was specifically designedto prevent this.Based on the thread responses, it’s clear thatmany Linux kernel developers prefer that the GPL continue to be used asthe Linux kernel license.
In addition, many people were suspicious about the motives for this offer.Groklawpublished an article that mentioned this proposal, andnoted that someone with the same nameis listed on a patent recently obtained by the Canopy Group.SCO is a Canopy Group company, and I havesince confirmed that the patent application refers to the same person.Groklawlater tried to learn more about him.I don’t really know why Merkey made this proposal, and itdoesn’t really matter.What’s more interesting to me is the questions that this raised,namely, how much is Linux “worth”?That is a valid question!
In one of the responses,Ingo Molnar calculated the cost to re-develop the Linux kernelusing my toolSLOCCount.Molnar didn’t specify exactly which version of the Linux kernel he used,but he did note that it was in the version 2.6 line, andpresumably it was a recent version as of October 2004.He found that “the Linux 2.6 kernel, if developed from scratchas commercial software, takes at least this much effort under thedefault COCOMO model”:
Total Physical Source Lines of Code (SLOC) = 4,287,449 Development Effort Estimate, Person-Years (Person-Months) = 1,302.68 (15,632) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 8.17 (98.10) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 159.35 Total Estimated Cost to Develop = $ 175,974,824 (average salary = $56,286/year, overhead = 2.40). SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL. Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
After noting the redevelopment cost of $176M (US),Ingo Molnar then commented,“and you want an unlimited license for $0.05M? What is this, the latestvariant of the Nigerian/419 scam?”
Strictly speaking,the value of a product isn’t the same as the cost of developing it.For example,if no one wants to use a software product, then it has no value, no matterhow much was spent in developing it.The value of a proprietary software product to its vendorcan be estimated bycomputing the amount of money that the vendor will receive from it over allfuture time (via sales, etc.),minus the costs (development, sustainment, etc.)over that same time period -- but predictingthe future is extremely difficult, and the Linux kernel isn’t aproprietary product anyway.Estimating value to users is difficult, and in fact,value estimation is surprisingly difficult to compute directly.But if a software productis used widely,so much so that you’d be willing toredevelop it, then development costs are a reasonable way to estimatethe lower bound of its value.After all,if you’re willing to redevelop a program, then it must haveat leastthat value.The Linux kernel is widely used, so its redevelopment costswill at least give you a lower bound of its value.
Thus, Molnar’s response is quite correct -- offering $50K for somethingthat would cost at about $176M to redevelop is ludicrous.It’s true that the kernel developers could continue to develop theLinux kernel after a BSD-style release, after all, the *BSD operating systemsdo this now.But with a BSD-style release, someone else could take the codeand establish a competing proprietary product, and it wouldtake time for the kernel developers to add enough additional materialto compete with such a product.It’s not clear that a proprietary vendor could really pick up the Linuxkernel and maintain the same pace without many of the original developers,but that’s a different matter.Certainly, the scale of the difference between $176M and $50K is enoughto see that the offer is not very much, compared to what the offereris trying to buy.
But in fact, it’s even sillier than it appears; I believe the cost toredevelop the Linux kernel would actually be much greater than this.Molnar correctly notes that he used the default Basic COCOMO modelfor cost estimation.This is the default cost model for SLOCCount, because it’sa reasonable model for rough estimates about typical applications.It’s also a reasonable default whenyou’re examining a large set of software programs at once, since the ranges ofreal efforts should eventually average out (this is the approach I used in myMore than a Gigabuck paper).So, what Molnar did was perfectly reasonable for getting a roughorder of magnitude of effort.
But since there’s only one program being considered in this analysis --the Linux kernel --we can use a more detailed model to get a more accurate cost estimate.I was curious what the answer would be.So I’ve estimated the effort to create the Linux kernel, using amore detailed cost model.This paper shows the results -- and it shows that redeveloping theLinux kernel would cost evenmore.
This estimate is what it would cost torebuild a particular version, and not exactly the same as the effortactually invested into the kernel.In particular, in Linux kernel development, a common practice is tohave a “bake-off” where competing ideas areall implementedand then measured; the approach with the best result(e.g., faster) is then used.Bake-offs have much to commend them, but since only one approachis actually included, the effort invested in the alternatives isn’tincluded in this estimate.
To get better accuracy in our estimation,we need to use a more detailed estimation model.An obvious alternative, and the one I’ll use, isthe Intermediate COCOMO model.This model requires more information than the Basic COCOMO model,but it can produce higher-accuracy estimations if you can providethe data it needs.We’ll also use the version of COCOMO that uses physical SLOC(since we don’t have the logical SLOC counts).If you don’t want to know the details, feel free to skip to the nextsection labelled “results”.
First, we now need to determine if this is an “organic”, “embedded”, or“semidetached” application.The Linux kernel is clearly not an organic application; organic applicationshave a small software team developing software in a familiar,in-house environment, without significant communication overheads,and allow hard requirements to be negotiated away.It could be argued that the Linux kernel is embedded, since it oftenoperates in tight constraints; but in practicethese constraints aren’t very tight,and the kernel project can often negotiate requirements to a limited extent(e.g., providing only partial support for a particular peripheralor motherboard if key documentation is lacking).While the Linux kernel developers don’t ignore resource constraints,there are no specific constraints that the developers feel arestrictly required.Thus, it appears that the kernel should be considereda “semidetached” system; this is theintermediate stage between organic and embedded.“Semidetached” isn’t a very descriptive word, but that’s the word used bythe cost model so we’ll use it here.It really just means between the two extremes of organic and embedded.
The intermediate COCOMO model also requires a number of additional parameters.Here are those parameters, and their values for the Linux kernel(as I perceive them); the parameter values are based onSoftware Engineering Economics by Barry Boehm:
So now we can compute a new estimate for how much effort itwould take to re-develop the Linux kernel 2.6:
MM-nominal-semidetached = 3*(KSLOC)^1.12 = = 3* (4287.449)^1.12 = 35,090 MMEffort-adjustment = 1.15 * 1.0 * 1.65 * 1.11 * 1.0 * 1.15 * 1.0 * 0.86 * 1.0 * 0.86 * 1.0 * 0.95 * 0.91 * 1.0 * 1.0 = 1.54869MM-adjusted = 35,090 * 1.54869 = 54,343.6 Man-Months = 4,528.6 Man-years of effort to (re)developIf average salary = $56,286/year, and overhead = 2.40, then:Development cost = 56286*2.4*4528.6 = $611,757,037
In short, it would actually cost about $612 million (US) to re-develop theLinux kernel.
Why is this estimate so much larger than Molnar’s original estimate?The answer is that SLOCCount presumes that it’s dealing with an“average” piece of software (i.e., a typical application) unlessit’s given parameters that tell it otherwise.This is usually a reasonable default; almost nothing is as hardto develop as an operating system kernel.But operating system kernelsare so much harder to develop that, if you include that difficultyinto the calculation, the effort estimations go way up.This difficulty shows up in the nominal equation -semidetached is fundamentally harder, and thus has a larger exponentin its estimation equation than the default for basic COCOMO.This difficulty also shows up in factors such as “complexity”;the task the kernel does is fundamentally hard.The strong capabilities of analysts and developers, use of modern practices,and programming language experience all help,but they can only partly compensate; it’s still very hard todevelop a modern operating system kernel.
This difference is smoothed over in my paperMore than a Gigabuckbecause that paperincludes a large number of applications.Some of the applications would cost less than was estimated, whileothers would cost more; in general you’d expect that by computing thecosts over many programs the differences would be averaged out.Providing that sort of information for every program would have beentoo time-consuming for the limited time I had available to write that paper,and I often didn’t have that much information anyway.If I do such a study again, I might treat the kernel specially, sincethe kernel’s size and complexity makes it reasonable to treat specially.SLOCCount actually has options that allow you to provide theparameters for more accurate estimates,if you have the information they need and you’re willingto take the time to provide them.Since the nominal factor is 3, the adjustment for this situationis 1.54869, and the exponent for semidetached projects is 1.12,just providing SLOCCount withthe option “--effort 4.646 1.12”would have created a more accurate estimate.But as you can see, it takes much more work to use this moredetailed estimation model, which is why many people don’t do it.For many situations, a rough estimate is really all you need;Molnar certainly didn’t need a more exact estimate to make his point.And being able to give a rough estimate when givenlittle information is quite useful.
In the end, Ingo Molnar’s response is still exactly correct.Offering $50K for somethingthat would cost millions to redevelop, and is actively used andsupported, is absurd.
It’s interesting to note that there are alreadyseveral kernels with BSD licenses: the *BSDs (particularlyFreeBSD, OpenBSD, and NetBSD).These are fine operating systems for many purposes,indeed, my website once ran on OpenBSD.But clearly, if there is a monetary offer to buy Linux code,the Linux kernel developers must be doing something right.Certainly, from a market share perspective, Linux-based systems are farmore popular than systems based on the *BSD kernels.If you just want a kernel licensed under a BSD-style license,you know where to find them.*
It’s worth noting that these approaches only estimate development cost,not value.All proprietary developers invest in development with the presumptionthat the value of the resulting product (as captured from license fees,support fees, etc.) will exceed the development cost -- if not, they’reout of business.Thus, since the Linux kernel is being actively sustained, it’s onlyreasonable to presume that itsvalue far exceeds this developmentestimate.In fact, the kernel’s value probablywell exceeds this estimate ofsimply redevelopment cost.
It’s also worth noting that the Linux kernel has grown substantially.That’s not surprising, given the explosion in the number of peripheralsand situations that it supports.InEstimating Linux’s size,I used a Linux distribution released in March 2000,and found that the Linux kernel had 1,526,722 physical source lines of code.InMore than a Gigabuck,the Linux distribution had been released on April 2001, and itskernel (version 2.4.2) was 2,437,470 physical source lines of code (SLOC).At that point, this Linux distribution would have cost morethan $1 Billion (a Gigabuck) to redevelop.The much newer and larger Linux kernel considered here, with far moredrivers and capabilities than the one in that paper,now has 4,287,449 physical source lines of code, andis starting to approach a Gigabuck of effort all by itself.If the kernel reaches 6,648,956 lines of code(($1E9/$56286/2.4*12/3/1.54869) ^ (1/1.12))given the other assumptionsit’ll represent a billion dollars of effort all by itself.And that’s just the kernel, which is only part of a working system.There are other components that weren’t includedMore than a Gigabuck(such as OpenOffice.org) that are now common in Linux distributions,which are also large and represent massive investments of effort.More than a Gigabucknoted the massive rise in size and scaleof OSS/FS systems, and that distributions were rapidly growing ininvested effort; this brief analysis is evidence that the trend continues.
In short, the amount of effort that today’s OSS/FS programs representis rather amazing.Carl Sagan’s phrase “billions and billions,” which he applied toastronomical objects, easily applies to the effort(measured in U.S. dollars) now invested in OSS/FS programs.
I’d like to thank Ingo Molnar for doing the original analysis(using SLOCCount) that triggered this paper.Indeed, I’m always delighted to see people doing analysis instead ofjust guesswork.Thanks for doing the analysis!This paper is not in any way an attack on Molnar’s work; Molnar computeda quick estimate, and this paper simply uses more data to refine hiseffort estimation further.
Also, I’d like to tip my hat toCharles Babcock’s October 19, 2007 article“Linux Will Be Worth $1 Billion In First 100 Days of 2009”.He noticed that, by my calculations, if the Linux kernel ever reached 6.6 million lines of code,it would be worth more than $1 billion in terms of equivalent, commercial development costs.Using the current size and growth rates of the Linux kernel, he examinedthe trend lines and found that“Sometime during the first 100 days of 2009, Linuxwill cross the 6.6 million lines of code mark and $1 billion in value.”
In 2010, researchers re-did the analysis, and found that ithadcrossed this milestone.Jesus Garcia-Garcia and Ma Isabel Alonso de Magdalenofound that the then-latest version (2.6.30) of the Linux kernel wouldcost an estimated EUR 1,025,553,430 to re-develop;at the exchange rate of 1.3499 U.S. Dollars per Euro of 2010-02-25(reported by Yahoo finance),this becomes about $1.4 billion.
The Linux kernel keeps growing;as of March 7, 2011, it would cost approximately $3 billion USD to redevelop using this estimation method.
Of course, the real story isn’t the exact numbers, it’s that instead ofdisappearing, FLOSS programs like the Linux kernel are thriving.
Anolder version of this essay was published in Groklaw. Feel free to see my home page athttps://dwheeler.com.You may also want to look at my paperMore than a Gigabuck: EstimatingGNU/Linux’s Size,my articleWhy OSS/FS? Look atthe Numbers!, and my papers and book onhow to developsecure programs. (C) Copyright 2004-2010 David A. Wheeler. All rights reserved.Thisarticle was reprinted in Groklaw by permission.Before September 28, 2011, this article was titledLinux Kernel 2.6: It’s Worth More!, but many of thepoints apply to versions other than Linux kernel version 2.6.
[8]ページ先頭