Git is a source code management system for software development. It was initially developed by
Linus Torvalds when he found the similar tools available were not meeting his requirements for Linux Kernel development. The main differentiators of Git compared to similar tools are speed,data integrity, and most importantly the support for distributed and non-linear workflows. Popular tools like CVS and Subversion are by design non-distributed. They have a central repository and everybody works with that. In case of Git there can be many repositories and it supports collaborating between repositories. The local copy itself is a fully fledged repository which has complete history and all version tracking commands and tools. In case of subversion local copy is just set of files having some meta data to figure out local changes done if any.
A recent video I watched (a google techTalk) Linus explains Git from design point of view. In that he talks about things like speed, data integrity, merge capabilities etc. but the most notable out of all was distributed nature of Git and why such behaviour is needed [at least for the development of Linux kernel]. It is a very interesting how large open source projects work, which he explain very clearly in this presentation.
Large open source projects are contributed by thousands of developers all over the world, they are all not all experts and not all have the same beliefs and styles of work. And most importantly they do not know every other well. So how does such projects proceed, how do they make sure code is not broken by a somebody. This all happens in a model which Linus nicely explains; "the network of trust" using his project Linux kernel development.
What is network of trust?
As Linus explains he trust a set of few people because he knows them personally, has seen their coding and quality of work. So when he gets an enhancement done by one of them he knows it works (the 'trust'). So with limited amount of review he can merge the changes to the main code base. Then each of these trusted guys also has a set of trusted people and they also works in the same way. And this continues to many levels and this is what is called network of trust.
If you are a node in this network of trust, you know the people below you and what capabilities they have. So when they do changes you know how much to test and how much to review before passing it upwards in this trust chain. And you also know what sort of expectation the guy above you has on this chain of trust. So before passing anything to the top you make sure everything works. This goes on ... and using this model thousands of developers work in on project based on network of trust. This is how Linux kernel get enhanced and many other open source projects actually progress with thousands of developers around the world.
Can Subversion support network of trust?
Well, it can, but not very easily. Subversion or CVS are all centralized repositories. The closest thing you can do is, you can have branches for various sub groups to work. Network of trust is not a two level concept; it can go up to many more levels. There is nothing called sub branches or sub-sub branches in Subversion or in similar tools. You can have separate repositories for each node in network of trust. But subversion do not have tools to merge between repositories to implement this network of trust model. So with all respect to subversion it is right to say subversion is not the right tool for projects which needs network of trust.
What makes Git suitable for Network of Trust?
In Git even your local copy is a fully-fledged repository. If you are a node in the network of trust for a project then you can open your repository to the people you trust. So they take a clone of your repo and create their own repos which they will give access to people whom they trust. The people you trust can upload their work to your repo using Git commands and once you do the due-diligence and make sure everything works you can upload it to your upper node repository to which you are a trusted person. So it is all about the features of Git to support this network of trust.
Does this network of trust model fits all projects?
Well, what Linus does definitely fits (otherwise he wouldn't have bothered to write Git!). Large other open source projects also fits. They may be not as religious as Linus when it comes to network of trust based work coordination but they also work more or less the same fashion, so Git fits them. What about a project with 4 people? well.... that is the main subject of this writeup.
Are you telling Git is of no use for small projects?
From my humble view the answer is 'it depends'.
Network of trust model support is the key feature of Git but there are more. Speed is also an important feature of Git, where it is definitely much faster than subversion when it comes to commands. But 1 second becoming 10 millisecond makes no much difference when it comes to a code commit. Data Integrity...well... how many of you have found subversion repo corrupted? even if it does we do have backups of the repo and all developers have copies in the local hard disks, so for me still not a solid reason to jump to Git.
But there is a one.. that is how well Git merge code. Subversion really sucks when it comes to code merge.
What is the problem in subversion when it comes to code merge?
In subversion when you have a branch and start making changes to that branch you have different code in trunk than branch. Branching is done when you want to work on a feature or a bug fix without disturbing people working in the trunk. But finally we need to merge things back to trunk so main code base is up to date. As far as you merge a branch to trunk just once everything is fine in subversion. Problems start to crop up when you merge the branch to trunk once and you want to merge again the same branch to trunk. Subversion doesn't keep track of changes already been merged to the trunk so it cannot merge what is not merged before. This was the case up until subversion 1.5. Version 1.6 can manage this as per documentation but I believe it is still incomplete. For example if you delete a file and recreate that in a branch and when you try to merge it to trunk, sub version still cannot do the merge properly. BTW these problems exist not only between trunk and branches but between two branches also.
When it comes to Git merging really work! Of course in network of trust model merging is the name of the game. If you are really in pain with subversion merge issues then that is good enough reason to say good buy to Subversion. Git is that good in merging.
Didn't we learn branching is evil?
I tend to follow Martin Fowler a lot and his Continuous Integration concepts in particular. Martin discourage branching and his moto is branching is evil hence we should avoid it by working on the trunk as much as we can. As he explains when you are in a branch you are away from integrating with rest of the code and more time you are away then higher the chances that you are breaking the code. When you do coding in trunk which has all changes from all developers you can know soon you break something. Continuous Integration is integrating code as frequently as possible, when you do all changes to trunk you have the maximum continuous integration.
But branching is something you cannot completely avoid; you have to do branching when you want to do something without affecting trunk for a limited duration. But the point is during this branch life period you loose continuous integration. So for you to branch you need to have a reason which has more merit than not to have continuous integration for that period. This decision to branch has to be taken by somebody who is mature enough to weigh the pluses and minuses for that particular case.
In this branches are evil concept obvious issue in Git is everybody is having a local copy (not just a branch it is a completely different repository for each person) and you are working on that. This is complete opposite of continuous integration. In fact network of trust is complete opposite of continues integration; in continues integration your moto is test and prove working than just trust.
Can't we really have both( Network of Trust and Continuous Integration)?
Well.. may be.. may be not.When you try to have both you really do not have either. But the point worth mentioning is Git can be used in centralised way;this mean you commit to main repository as frequently as it should be. But Git is really built for network of struct model so it is always tempting to follow that model when you use Git.
Network of Trust model versus Continuous Integration
Git versus Subversion discussion is really about which model you want to follow. If you think your project fits Network of trust then using Git is a no brainer. If you think you want centralised repository where team commits as frequently as they can where continuous integration checks for integration issues and notifies the team on the spot, then you better stay with Subversion.
My humble Conclusion: If you are on a huge project modeled as a network of trust project then using Git is a no brainer, go for it, there is nothing other than Git for your project. If you are in very small project where you work on trunk almost all the time then it is easy for you to mange things with simple subversion than Git. If you are on a project with needs merging quite often then Git comes handy even with if you do not use all jazzy features of Git. If you are with some average IQ level developers do not make things complex by getting git into your project, you better stay with subversion.