
Insoftware development, afork is acodebase that is created by duplicating an existing codebase and, generally, is subsequently modified independently of the original.Softwarebuilt from a fork initially has identical behavior as software built from the original code, but as thesource code is increasingly modified, the resulting software tends to have increasingly different behavior compared to the original.[example needed] A fork is a form ofbranching, but generally involves storing the forked files separately from the original; not in therepository. Reasons for forking a codebase include user preference, stagnated or discontinued development of the original software or aschism in the developer community.[1] Forking proprietary software (such asUnix) is prohibited bycopyright law without explicit permission, butfree and open-source software, by definition, may be forked without permission.
The wordfork has been used to mean "to divide in branches, go separate ways" as early as the 14th century.[2]
In the context of software development,fork was used in the sense of creating a revision controlbranch byEric Allman as early as 1980, in the context ofSource Code Control System:[3]
Creating a branch "forks off" a version of the program.
The term was in use onUsenet by 1983 for the process of creating a subgroup to move topics of discussion to.[4]
Althoughfork is not known to have been used in the sense of a community schism during the origins of Lucid Emacs (nowXEmacs) (1991) or theBerkeley Software Distributions (BSDs) (1993–1994),Russ Nelson used the termshattering in this sense in 1993 (attributing it toJohn Gilmore).[5] In 1995,fork was used to describe the XEmacs split,[6] and was an understood usage in theGNU Project by 1996.[7]
The word is used similarly for thefork() system call which causes a runningprocess to split in two – typically, to allow them to perform different tasks in parallel.[8]
Free andopen-source software may be legally forked without prior approval of those currently developing, managing, or distributing the software per bothThe Free Software Definition andThe Open Source Definition:[9]
The freedom to distribute copies of your modified versions to others (freedom 3). By doing this, you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
3. Derived Works: The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.
In free software, forks often result from a schism over different goals or personality clashes. In a fork, both parties assume nearly identical code bases, but typically only the larger group, or whoever controls the web site, will retain the full original name and the associated user community. Thus, there is a reputation penalty associated with forking.[9] The relationship between the different teams can be cordial or very bitter. On the other hand, afriendly fork or asoft fork is a fork that does not intend to compete, but wants to eventually merge with the original.
Eric S. Raymond, in his essayHomesteading the Noosphere,[12] stated that "The most important characteristic of a fork is that it spawns competing projects that cannot later exchange code, splitting the potential developer community". He notes in theJargon File:[13]
Forking is considered a Bad Thing—not merely because it implies a lot of wasted effort in the future, but because forks tend to be accompanied by a great deal of strife and acrimony between the successor groups over issues of legitimacy, succession, and design direction. There is serious social pressure against forking. As a result, major forks (such as theGnu-Emacs/XEmacs split, the fissioning of the386BSD group into three daughter projects, and the short-lived GCC/EGCS split) are rare enough that they are remembered individually in hacker folklore.
David A. Wheeler notes[9] four possible outcomes of a fork, with examples:
Distributed revision control (DVCS) tools have popularised a less emotive use of the term "fork", blurring the distinction with "branch".[14] With a DVCS such asMercurial orGit, the normal way to contribute to a project, is to first create a personal branch of the repository, independent of the main repository, and later seek to have your changes integrated with it. Sites such asGitHub,Bitbucket andLaunchpad provide free DVCS hosting expressly supporting independent branches, such that the technical, social and financial barriers to forking a source code repository are massively reduced, and GitHub uses "fork" as its term for this method of contribution to a project.
Forks often restart version numbering from numbers typically used for initial versions of programs like 0.0.1, 0.1, or 1.0 even if the original software was at another version such as 3.0, 4.0, or 5.0. An exception is sometimes made when the forked software is designed to be a drop-in replacement for the original project,e.g.MariaDB forMySQL[15] orLibreOffice forOpenOffice.org.
TheBSD licenses permit forks to become proprietary software, andcopyleft proponents say that commercial incentives thus make proprietisation almost inevitable. (Copyleft licenses can, however, be circumvented via dual-licensing with a proprietary grant in the form of aContributor License Agreement.) Examples includemacOS (based on the proprietaryNeXTSTEP and the open sourceFreeBSD),Cedega andCrossOver (proprietary forks ofWine, though CrossOver tracks Wine and contributes considerably), EnterpriseDB (a fork ofPostgreSQL, adding Oracle compatibility features[16]), Supported PostgreSQL with their proprietary ESM storage system,[17] and Netezza's[18] proprietary highly scalable derivative of PostgreSQL. Some of these vendors contribute back changes to the community project, while some keep their changes as their own competitive advantages.
Inproprietary software, the copyright is usually held by the employing entity, not by the individual software developers. Proprietary code is thus more commonly forked when the owner needs to develop two or more versions, such as awindowed version and acommand line version, or versions for differing operating systems, such as aword processor forIBM PC compatible machines andMacintosh computers. Generally, such internal forks will concentrate on having the same look, feel, data format, and behavior between platforms so that a user familiar with one can also be productive or share documents generated on the other. This is almost always an economic decision to generate a greatermarket share and thus pay back the associated extra development costs created by the fork.
A notable proprietary fork not of this kind is the many varieties of proprietaryUnix—almost all derived from AT&T Unix under license and all called "Unix", but increasingly mutually incompatible.[19]SeeUnix wars.
Forks are a natural part of the open development model—so much so that GitHub famously plasters a "fork your own copy" button on almost every page.See alsoNyman, Linus (2015).Understanding Code Forking in Open Source Software (PhD). Hanken School of Economics. p. 57.hdl:10138/153135.
Where practitioners have previously had rather narrow definitions of a fork, [...] the term now appears to be used much more broadly. Actions that would traditionally have been called a branch, a new distribution, code fragmentation, a pseudo-fork, etc. may all now be called forks by some developers. This appears to be in no insignificant part due to the broad definition and use of the term fork by GitHub.