Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Steffen Ronalter
Steffen Ronalter

Posted on

     

From Subversion to Git: Snapshots

What does it mean that we talk about snapshots of our Git repository, while in Subversion we think in terms of file changes? For me at least, the key to understanding Git is that every commit is, in fact, a snapshot of the entire project. Not a list of patches. Not a difference to the previous commit. Justa snapshot of the whole thing.

Git snapshots everything, they said. Coming from Subversion, this is hard to believe. How would a version control system scale if it stored the entire project state again and again, with each and every commit?

First, let's do a little experiment on how one might approach version controlintuitively, without considering neither Git nor Subversion.

Poor Man's Version Control

Let's say we have a project calledmyapp stored in a directory of that name. All it contains is a main.c file:

myapp  main.c
Enter fullscreen modeExit fullscreen mode

Without version control, how would you track changes in order to be able to restore a particular state later? Easy enough, you might say: Just create a copy of the entiremyapp directory and call it something likemyapp-<version>. After a while, you would end up with a bunch ofbackup directories:

myapp-01  main.cmyapp-<...>  main.cmyapp-<N>  main.c
Enter fullscreen modeExit fullscreen mode

To step back to a previous state in history, you might go and replace the entiremyapp directory by one of thesesnapshots created previously.

In order to avoid wasting space by keeping so much redundant information, you might consider putting everything into a gzipp'ed archive and deleting all the backup directories:

tar -cvzf myapp.tgz myapp-*rm -r myapp-*
Enter fullscreen modeExit fullscreen mode

Interestingly, this naive approach is not completely different from the way Git actually works.

How Git does it

Every time you create a commit, Git takes the content of each added or modified file, compresses it and stores it in an internalobject database together with acommit object that holds some meta information1. This approach makes it easy to reason about, as it's no more difficult than what we've done in the simple attempt mentioned above.

You may realize a big drawback here though: Although individual file contents are compressed -- which is fine--, even small changesbetween commits will cause massive duplication inside the object database.

In Git, this scalability problem is simply ignored at the first stage and solved later on. In a process calledpacking, all the objects are delta compressed and moved into one or morepackfiles. This is done on several occasions; you can enforce it usinggit gc --aggressive.

The drawing below shows a simplified illustration of the storage of compressed file content into blob objects as well as the packfile generation.

objects

Conclusion

Even for everyday Git usage it is vitally important to understand a bit of its inner workings; it's good to see that the basic idea is not inherently complex but more or less identical with what we might come up with anyway.

This understanding gives us the power to get a grasp of all the more advanced features like branching, merging and rebasing.

This post has originally been published onsteffen.ronalter.de

References


  1. For all the details please refer to the sectionGit Objects of the excellent Pro Git book. 

Top comments(2)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss
CollapseExpand
 
5n4p_ profile image
Andreas Schnapp
  • Joined
• Edited on• Edited

In my opinion repacking is just a little optimization for saving some storage on the disc and not a fundamental concept of git

Git would work pretty well without this function. Git never store the exact same object twice. So if you have two commits which only differs by one file change, every file except the one who's changes is used again for the second commit. (underneath it's like a key value store which uses the hole content to generate a unique key (sha-1)).

The changing file will be stored a second time here as a complete new file. Git does does not track file changes. So, if this file is big it could be a little a waste of disc space (normally not so important today). But to make such situation more efficient, you can use the repack feature.

CollapseExpand
 
ronalterde profile image
Steffen Ronalter
Hi, I’m Steffen. I create Embedded software for a living.I always strive to improve the quality of my code by studying new methods, paradigms and technologies.
  • Location
    Germany
  • Joined

You‘re right. Repacking is just an optimization.

Speaking in the naive analogy, you don‘t need to compress the backup folders after all. It‘s an optimization to save disk space.

As suggested in a comment to the original article, this analogy might a bit misleading...

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Hi, I’m Steffen. I create Embedded software for a living.I always strive to improve the quality of my code by studying new methods, paradigms and technologies.
  • Location
    Germany
  • Joined

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp