
This document is a brief introduction to version control. After readingit, you will be prepared to perform simple tasks using a version controlsystem, and to learn more from other documents that may lack a high-levelconceptual overview. Most of its advice is applicable to all versioncontrol systems, but its examples mostly useGitfor concreteness.
This document's main purpose is to lay out philosophy and advice that Ihaven't found elsewhere in one place. It is not an exhaustive reference tothe syntax of particular commands. This document covers basics, but itdoes not go into advanced topics like branching, nor does itdiscuss the ways in which large projects use version control differentlythan small ones.
Contents:
(Also seeHow to create and review a GitHub pull request.)
If you are already familiar with version control, you can skim or skip thissection.
A version control system serves the following purposes, among others.
Version control uses arepository (a database of program versions) and aworking copy where you edit files.
Yourworking copy (sometimes called acheckout orclone) is yourpersonal copy of all the files in the project. You make arbitrary edits tothis copy, without affecting your teammates. When you are happy with youredits, youcommit your changes to arepository.
A repository is a database of all the edits to your project.Equivalently, it is a database of all historical versions (snapshots) ofyour project.It is possible for the repository to containedits that have not yet been applied to your working copy. You canupdate your working copy to incorporate any new edits or versionsthat have been added to the repository since the last time you updated.See the diagram at the right.
In the simplest case, the database contains a linear history: each changeis made after the previous one. Another possibility is that differentusers made edits simultaneously (this is sometimes called“branching”). In that case, the version history splits andthen merges again. The picture below gives examples. In the timeline onthe right, Version 4 is called a "merge".
There are two general varieties of version control:centralizedanddistributed. Distributed version control is more modern, runsfaster, is less prone to errors, has more features, and is morecomplex to understand. Centralized version control is rarely used nowadays.
The most popular version control system is Git (distributed). In practice, most projects use Git.Other version control systems are Mercurial (distributed) and Subversion (centralized).
The main difference between centralized and distributed version control isthe number of repositories. In centralized version control, there is justone repository, and in distributed version control, there are multiplerepositories. Here are pictures of the typical arrangements:
Incentralized version control, each user gets their own working copy,but there is just one central repository. As soon as you commit, it ispossible for your co-workers to update and to see your changes. For othersto see your changes, 2 things must happen:
Indistributed version control, each user gets their ownrepositoryand working copy. After you commit, others have noaccess to your changes until you push your changes to the centralrepository. When you update, you do not get others' changes unless youhave first fetched those changes into your local repository. For others to seeyour changes, 4 things must happen:
git pull does bothfetch andupdate, in one git command.)Notice that the commit and update commands only move changes between theworking copy and the local repository, without affecting any otherrepository. By contrast, the push and fetch commands move changes betweenthe local repository and the central repository, without affecting yourworking copy.
Thegit pull command is equivalent togit fetchthengit update. This is handy because you usually want toperform both operations. But, this makes Git's terminology a bitconfusing, sincepush andpull are notcomplements of one another.(Mercurial's naming is more logical: Mercurial'spulloperation is like git'sfetch, and Mercurial'sfetch operation is like git'spull; thatis,hg fetch performs bothhg pull andhg update.)
The diagram at right shows which operations affect the working copy (red),which affect the repository (blue), and which affect both (purple).Merging,git add, and the staging area are explained later.
A version control system lets multiple users simultaneously edit their owncopies of a project. Usually, the version control system is able to mergesimultaneous changes by two different users: for each line, the finalversion is the original version if neither user edited it, or is the editedversion if one of the users edited it. Aconflict occurs when twodifferent users make simultaneous, different changes to the same line of afile. In this case, the version control system cannot automatically decidewhich of the two edits to use (or a combination of them, or neither!).Manual intervention is required to resolve the conflict.
“Simultaneous” changes do not necessarily happen at the exactsame moment of time. Change 1 and Change 2 are considered simultaneous if:
In a distributed version control system,pushandfetch never cause a conflict. These operations can causethe repository to contain multiple different (perhaps mutually exclusive)histories that coexist.There is an explicit operation,calledmerge, that combines simultaneousedits by two different users. Sometimesmerge completesautomatically, but if there is a conflict,merge requests helpfrom the user.
Usually you don't have to runmerge explicitly,because it is automatically run by theupdate command (recallthatgit pull runsgit fetch thengit update).
It is better to avoid a conflict than to resolve it later. Thebest practices below give ways to avoidconflicts; for example, teammates should frequently share their changeswith one another.
Conflicts are bound to arise despite your best efforts. It's smart topractice conflict resolution ahead of time, rather than when you arefrazzled by a conflict in a real project. You can do so inthistutorialabout Git conflict resolution.
If you rungit config --global merge.conflictStyle zdiff3 onevery computer where you use git, then git's merge conflicts will be moreinformative because they will show not just the differences between the twoparent commits of the conflict, but also the common ancestor of the twoparents. Knowing the common ancestor is often essential to resolving themerge conflict correctly.
Recall thatupdate changes the working copy by applying any editsthat appear in the repository but have not yet been applied to the workingcopy.
In a centralized version control system, you canupdate (forexample,svn update) at any moment, even if you havelocally-uncommitted changes. The version control system merges youruncompleted changes in the working copy with the ones in the repository.This may force you to resolve conflicts. It also loses the exact set ofedits you had made, since afterward you only have the combined version.The implicit merging that a centralized version control system performswhen youupdate is a common source of confusion and mistakes.
In a distributed version control system, if you have uncommitted changes inyour working copy, then in some cases you cannot runupdate (or other commandslikegit pull that themselves invokeupdate). The reason is that it would be confusing and error-pronefor the version control system to try to apply edits, when you are in themiddle of editing. You will receive an error message such as
abort: outstanding uncommitted changes
Before you are allowed toupdate, you must firstcommitany changes that you have made (you should continue editing until they arelogically complete first, of course). Now,your repository database containssimultaneous edits — theones you just made, and the ones that were already there and you weretrying to apply to your working copy by runningupdate. You needtomerge these two sets of edits, thencommit the result.The reason you need the commit is that merging is an operationthat gets recorded by the version control system, in order to record anychoices that you made during merging. In this way, the version controlsystem contains a complete history and clearly records the differencebetween you making edits and you merging simultaneous work.
The advice in this section applies to both centralized and distributedversion control.
These best practices do not cover obscure or complex situations. Once youhave mastered these practices, you can find more tips and tricks elsewhereon the Internet.
It only takes a moment to write a good commit message.This is useful when someone is examining the change, because it indicatesthe purpose of the change.This is useful when someone is looking for changes related to a givenconcept, because they can search through the commit messages.
Each commit to the main branch should have a single purpose and should completely implementthat purpose. For example, a commit should not contain a bug fix, a newfeature, and a typo fix; you should make three different commits instead.Many commits to the main branch are made via pull requests from a differentbranch. Therefore, this advice can also be stated as: each branch should have asingle purpose and, at the time it is merged into the main branch, shouldcompletely implement that purpose.
This makes it easier to locate the changes related to someparticular feature or bug fix, to see them all in one place, to undo them,to determine the changes that are responsible for buggy behavior, etc.The utility of the version control history is compromised if one commitcontains code that serves multiple purposes, or if code for a particularpurpose is spread across multiple different commits.
During the course of one task, you may notice something else you want tochange. You may need to commit one file at a time — thecommit command of every version control system supportsthis.
In Git,git commitfile1file2 commitsthe two named files.
Alternately,git addfile1file2 “stages” the two named files, causing them to be committed by the nextgit commit command that is run without any filename arguments.
You can also stage part of a file. This is useful if you want to excludelocal changes, such as diagnostics. Version control tools supportinteractive staging, in which tool asks the user, for each changed hunk,whether to stage that hunk or not. It can take a little while to get usedto these tools and to the staging area more generally, but you willeventually find them quite useful.
It is very easy to commit more changes than you intended to. Double-check thechanges before you make a commit. Here are some commands that you shouldlikely run before each commit.
# Lists all the modified files git status # Shows specific differences, helps me compose a commit message. # If I am using the staging area: git diff --staged # Whether I am using the staging area or not: git diff # If I am using the staging area: git commit -m "Descriptive commit message" # If I am not using the staging area, commit just the files I want to: git commitfile1file2 -m "Descriptive commit message"
Work with the most up-to-date version of the files as possible. That meansthat you should rungit pull very frequently. I do this every day,on each of the hundreds of projects that I am involved with. (I use theprogrammulti-version-control.)
When two people make conflicting edits simultaneously, then manualintervention is required to resolve the conflict. But if someone else hasalready completed a change before you even start to edit, it is a huge waste oftime to create, then manually resolve, conflicts. You would have avoidedthe conflicts if your working copy had already contained the other person'schanges before you started to edit.
Once you have committed the changes for a complete, logical unit of work,you should share those changes with your colleagues as soon as possible (bydoinggit push). So long as your changes donot destabilize the system, you should not hold the changes locally while you makeunrelated changes. The reason is the same as the reason forincorporating others' changes frequently.
The version control system can often merge changes that different peoplemade simultaneously. However, when two people edit the same line, thenthis is aconflict that a person mustmanually resolve. To avoid this tedious, error-prone work, you shouldstrive to avoid conflicts.
If you plan to make significant changes to a file that othersmay be editing, coordinatewith them so that one of you can finish work(commit and push it) before the other gets started.Examples include a wide-scale renaming or other code reorganization.
Version control tools record changes and determine conflicts on aline-by-line basis. The following advice applies to editing marked-up text(LaTeX, HTML, etc.). It does not apply when editing WYSIWYG text (such asa plain text file), in which the intended reader sees the original source file.
Never refill/justify paragraphs. Doing so changes every line of theparagraph and makes merge conflicts likely. Refilling paragraphs alsomakes it hard to determine, later, what part of the content changed in agiven commit, and which commits affected given content (as opposed to justreformatting it). If you follow this advice and do not refill/rejustifythe text, then the LaTeX/HTML source might look a little bit funny, withsome short lines in the middle of paragraphs. But, no one sees that exceptwhen editing the source, and the version control information is moreimportant.
Do not write excessively long lines; as a general rule, keep each line toabout 80 characters. The more characters are on a line, the larger the chancethat multiple edits will fall on the same line and thus will conflict.Also, the more characters, the harder it is to determine the exact changeswhen viewing the version control history. As another benefit to authors ofthe document, 80-character lines are also easier to read whenviewing/editing the source file.
Version control is intended for files that people edit. Generated filesshould not be committed to version control. For example, do not commitbinary files that result from compilation, such as.o files or.class files. Also do not commit.pdf files that aregenerated from a text formatting application; as a rule, you should onlycommit the source files from which the.pdf files are generated.
make,gradle,mvn,ant, etc.To tell your version control system to ignore given files, create a.gitignore at the top level of your repository, or createa.gitignore-global directory in your user home directory.
The least pleasant part of working with version control is resolvingconflicts. If you follow best practices, you will have to resolveconflicts relatively rarely.
You are most likely to create conflicts at a time you are stressed out,such as near a deadline. You do not have time, and are not in a goodmental state, to learn a merge tool. So, you should make sure that youunderstand your merge tool ahead of time. When an actual conflict comesup, you don't want to be thrown into an unfamiliar UI and make mistakes.Practice on a temporary repository to give yourself confidence.You can do so inthistutorialabout Git conflict resolution.
A version control system lets you choose from a variety of merge tools;to see Git's list, rungit mergetool --tool-help. Selectthe one you like best. Many of the merge tools start an interactive GUIprogram. If you don't want that,you can configure your version control system to attempt the merge andwrite a file with conflict markers if the merge is not successful. Then,you can use your favorite editor, or a tool that is not a git mergetool.
Obtaining your own working copy of the project is called “cloning” or“checking out”:
git cloneURLhg cloneURLsvn checkoutURLUse your version control's documentation to learn how to create a newrepository. Example commands includegit init,hginit, andsvnadmin create, but you will more oftencreate a repository at a hosting site such as GitHub, then clone it locally.
A typical workflow when using Git is:
git pullgit branchNEW-BRANCH-NAMEgit checkoutNEW-BRANCH-NAMEgit status andgit diffgit commit, orgit add thengit commitgit pullgit pushNote that an invocation ofgit pull orhg fetchmay force you to resolve a conflict.
That's pretty much all you need to know, besides how to clone an existingrepository.
hg fetch, nothg pull(This tip is specific to Mercurial. In Git, just usegit pull.Git'spull acts similarly to Mercurial'sfetch.)
I never runhg pull. Instead, I usehg fetch. It is themost effective way to get everyone else's changes into my working copy.Thehg fetch command is likehg pull thenhgupdate, but it is even more useful. The actual effect ofhgfetch is:
hg pullhg mergehg commithg updateTo enable thehg fetch command, add the following to your$HOME/.hgrc file or equivalent:
[extensions]fetch =
There is nothing after the “=” in “fetch =”.
Git or Mercurial occasionally refuses to do a particular action, such as pushingto a remote repository when you have not yet fetched all its changes.For example, Mercurial indicates this problem by outputting:
abort: push creates new remote heads!(did you forget to merge? use push -f to force)
In the second line of the message, Mercurial makes two suggestions:
hg fetch (nothg merge),then you can try again to push.-f command-lineoption, which stands for “force” and can also be written as--force.-f or--force: doing so is likely tocause extra work for your team, such as making multiple people perform the samemerge.git rebase is a powerful command that lets you rewrite theversion control history. Rebasing can change a commit, change commitmessages, reorder commits, squash multiple commits into one, split onecommit into multiple commits, delete commits, and more.
Never userebase, includinggit pull -r.(Until you are more experienced with git. And, then still don't use it.)
Rewriting history is ineffective if anyone else has cloned your repository.Your changes to history will be added to the existing history in all remoterepositories, which is not the effect you wanted. Rewriting historyfrequently causes difficult merge conflicts, and it may force those mergeconflicts on multiple users. Rewriting history makes it more difficult tounderstand the actual development history, and recording the truedevelopment history is a main purpose of a version control system.
If you want to keep your development history clean, there are better waysthan rewriting history, such assquash-and-mergingGitHub pullrequests.
As explained above, you cannotupdate until youcommitandmerge. You will see an error message like
abort: outstanding uncommitted changes
But, sometimes you really want to incorporate others' changes even thoughyour changes are not yet in a logically consistent state and ready tocommit to your local repository.
A low-tech solution is to revert your changes withhg revert orthe analogous command for other version control systems.Now, you cangit pull orhg fetch, but you will have tomanually re-do the changes that you moved aside. There are other, moresophisticated ways to do this as well (for Git, usegit stash;for Mercurial, see theMercurial FAQ).
SVN (Subversion) automatically caches your password. You have to type thepassword only the first time.
GitHub and other hosting services give you multiple URLs of the mainrepository (the one on the GitHub servers) that you can clone.
git@github.com:owner/repo.git, then you can use an SSH key and thessh-agent program, and you will never need to type a password. SeeGitHub's instructions.https://github.com/owner/repo.git, then you can install a credential manager, or the GitHub CLI, so you only have to type your credentials once. SeeGitHub's instructions.Here are two ways to have Mercurial remember/cache your password so youdon't have to type it every time.
hg clone https://michael.ernst:my-password-here@jsr308-langtools.googlecode.com/hg/ jsr308-langtools
.hgrc file (which should not be world-readable!), add this section:# The below only works in Mercurial 1.3 and later[auth]googlecode.prefix = code.google.comgooglecode.username =michael.ernstgooglecode.password =my-password-heregooglecode.schemes = httpsdada.prefix = dada.cs.washington.edu/hgweb/dada.username =mernstdada.password =my-password-heredada.schemes = https
It's a good idea to set up email notification. Then, every time someonepushes (in distributed version control) or commits (in centralized versioncontrol) all the relevant parties get an email about the changes to thecentral server.
If you are using a hosted service such as GitHub or Bitbucket, it's easy toset up email notification on theirwebsite;here are the GitHub instructions.
Back toAdvice compiled by Michael Ernst.
Michael Ernst