Thursday, August 13, 2009

Git, Mercurial, and Bazaar—simplicity through inflexibility

Git, Mercurial, and Bazaar, the most prominent DAG-based distributed version control systems, have rightly earned a reputation for making branching and merging easy enough to do every day. And their proponents often deride Subversion for making branching easy but merging difficult.

What is the magic pixie dust that allows DVCSs to merge so effortlessly? And why is Subversion too stupid to get it right? (And how can the author of this blog possibly use the words "simplicity" or "inflexibility" in the same breath as "git"???)

Partisanship disclaimer: I like both Subversion and git and use them both daily, and I expect that I would like Mercurial and Bazaar if I had time to work with them. I have contributed patches to the testing frameworks of both Subversion and git, and I am the maintainer of cvs2svn/cvs2git. In other words, I'm a fan of all of the VCSs discussed in this article.

Contrary to what you might think, DVCSs' merging prowess is not because those projects have rocket scientists developing infallible merge algorithms. Their algorithms work better, but not radically so. Merge conflicts are an unavoidable aspect of parallel development, and nasty ones will occur regardless of what VCS is being used.

Rather, it is because the DAG-based DVCSs cannot represent complicated histories and therefore do not have to merge them.

Inflexible DVCSs

To understand merging in DAG-based VCSs, it is necessary to understand the Fundamental Law of DAG-based VCSs:

Every change made by every commit that is an ancestor of a given revision is assumed to be incorporated into that revision.

The Fundamental Law simplifies merging because it vastly restricts the types of histories that can be represented in the VCS. For example, the DAG cannot represent the history of

  • Cherry picking, in which some (but not all) commits from one branch are incorporated into another branch
  • Rebases in which the new parent is not a descendant of the old parent
  • Rebases in which the order of the commits is changed
  • Merging changes from branch to branch at a less-than-full-tree granularity; for example, merging the changes to file1 but not the changes to file2

Of course I am not claiming that the DVCSs do not support these operations. Rather, I am claiming that their support, where it exists, is broken because the DAG cannot properly record the histories of such operations. That is why, for example, it is problematic to rebase branches that have been published (though for some kinds of rebases there may be a solution). It is also why it is common to have conflicts when merging two branches if some content has already been cherry-picked from one branch to the other.

Flexible Subversion

Subversion, on the other hand, completely earned its reputation for making merging difficult—primarily because it didn't record merges at all. That made merging a nightmare of manual record-keeping or reliant on third-party tools like svnmerge.py. But that was before Subversion release 1.5.

Starting with release 1.5, Subversion, ironically, supports a much more flexible model of merging than the DAG-based DVCSs. Changes from any commit can be merged to any branch at the single-file level of granularity, enabling all of the operations listed above and some even weirder things (for example, a change that was originally applied to one file can be "merged" onto a completely different file). If your workflow demands this sort of thing, Subversion might hold significant advantages for you.

But there are also many disadvantage to Subversion's flexibility:

  • Subversion's merging model is more complicated than that of DAG-based VCSs, and therefore more complicated to implement and less predictable.
  • It is much harder to visualize the history of a Subversion project (contrast that to DVCSs, whose history can be displayed as a single DAG).
  • Subversion merges are innately slow, because of the large quantities of metadata that have to be manipulated.
  • The bookkeeping of SVN merge info requires more user conscientiousness, and mistakes are not as easy to spot and fix.

Simplicity vs. flexibility

Of course, each of these systems has dramatic strengths and weaknesses related to implementation quality, user interface, etc. But on the single criterion of how merge history is recorded, I claim that git, Mercurial, and Bazaar are fundamentally simpler than Subversion because they are less flexible.

So, does Subversion's flexibility mean that it is better than the DVCSs? Definitely not. Many projects don't need the flexibility offered by Subversion—or even worse, they mistakenly think that they need it. For such projects, the simpler DAG-based model is adequate and will make branching and merging go more smoothly. Other projects, particularly those that rely heavily on cherry-picking, might consider whether Subversion, or perhaps a DVCS like darcs that has a different model, might be a better fit. And there are certainly other considerations, besides the history model, that will influence the decision of which VCS to use.

Me? I'm going to continue using both.

4 comments:

Unknown said...

Great post on the insights behind DVCS. I've been trying to wrap my head around the philosophy behind merging vs rebasing, and this post has proved to be just what I needed. :)

Matth said...

You might also want to look at darcs, which can represent all the things you mention, and make powerful inferences from them in a much more elegant way than svn or git.

Works because it knows about which patches commute with which others, and so it can do a bunch of algebra with patches and treat them modulo reorderings of patches that commute. Rather than just representing as a point in a fixed ordering in a DAG, or an SVN revision with some merging metadata tacked onto it.

Has its downsides too but worth familiarising with...

Andreas Krey said...

Quite late to the game, but... subversion has a theoretically superior merging model b/c they can track information one-DAG-bases system simply can't.

BUT: The actual implementation is still broken. Because of the implementation of rename as copy&delete the most simple merge scenarios already create superfluous conflicts. The representation used for merge tracking can not represent cyclic merges (to and fro branch, or around a set of branches) at all, causing further complications. And there is no sign that either of these are going to be fixed anytime soon.

Michael Haggerty said...

@Andreas Krey: I agree that Subversion's merging is completely broken in the presence of file moves. But I don't agree that the problem is "the implementation of rename as copy&delete". git doesn't record file renames either--in fact, it does not even record file copies (a rename is just an add&delete)--and yet seems quite capable at merging. git deduces file renames at merge time from less information than Subversion has available.

To be fair, the kind of merges that a DAG-based system is asked to do are intrinsically easier than those attempted by Subversion. But even simple (DAG-like) merges are botched by Subversion if they involve file renames.