Sunday, August 9, 2009

Upstream rebase Just Works™ if history is retained

The git-rebase documentation has a section entitled "Recovering from Upstream Rebase" describing how to straighten out the mess that results from an upstream repository doing a rebase. In my last blog post I described how simple rebases can be done with history, and claimed that this makes it possible to share a repository that has been rebased. In this post, I would like to show how the example from the git-rebase manual page work out if rebase history is retained.

First, the example without history. The premise is that you are working on a topic branch that is dependent on a subsystem that somebody else is working on:

m---m---m---m---m---m---m---m  master
     \
      o---o---o---o---o  subsystem
                       \
                        *---*---*  topic

Then the subsystem developer rebases subsystem onto master:

m---m---m---m---m---m---m---m  master
     \                       \
      o---o---o---o---o       o'--o'--o'--o'--o'  subsystem
                       \
                        *---*---*  topic

If you now merge the topic branch to the subsystem branch, a mess results:

m---m---m---m---m---m---m---m  master
     \                       \
      o---o---o---o---o       o'--o'--o'--o'--o'--M  subsystem
                       \                         /
                        *---*---*-..........-*--*  topic

The first problem here is that git has no idea that the two versions of the subsystem revisions, namely "o-o-o-o-o" and "o'-o'-o'-o'-o'", actually contain the same logical changes. Therefore, the merge that creates commit M can result in spurious conflicts. Specifically, git will try to combine the changes in "o-o-o-o-o" with those in "o'-o'-o'-o'-o'", and may or may not be able to make sense out of the situation.

Second, in some (admittedly rather unlikely) situations, git's ignorance can cause problems with future merges.

Finally, there is no easy way to convert the merged topic branch into a series of patches that apply cleanly to the rebased subsystem branch and can therefore be submitted upstream. In this case, the only available patch that applies cleanly to the rebased subsystem branch is the merge patch M, which is a single commit that squashes together the entire topic branch and is therefore difficult to review.

In fact, in the git world you typically wouldn't merge the topic branch into the subsystem branch. Instead, you would rebase the topic branch onto the rebased subsystem branch:

m---m---m---m---m---m---m---m  master
                             \
                              o'--o'--o'--o'--o'  subsystem
                                               \
                                                *'--*'--*'  topic

This gives a usable result, but requires extra effort by the topic branch maintainer (and the maintainers of any other branches that rely on the subsystem branch or the topic branch). Because this rebase is also prone to conflicts unless you explicitly tell git what interval of revisions to rebase, using a command like git rebase --onto subsystem SHA1 topic. SHA1 refers to the last commit in the pre-rebase "o-o-o-o-o" series, and you have to determine it yourself. Rebase-with-history relieves the user from having to do this kind of manual bookkeeping.

If, instead, the subsystem maintainer had retained history when rebasing as described in my earlier post, the DAG would look like this:

m---m---m---m---m---m---m---m  master
     \                       \
      \                       o'--o'--o'--o'--o'  subsystem
       \                     /   /   /   /   /
        --------------------o---o---o---o---o
                                             \
                                              *---*---*  topic

Now you are free to merge the topic branch with the subsystem branch:

m---m---m---m---m---m---m---m  master
     \                       \
      \                       o'--o'--o'--o'--o'  subsystem
       \                     /   /   /   /   / \
        --------------------o---o---o---o---o   ---------
                                             \           \
                                              *---*---*---x  topic

The fact that both the old and new versions of the subsystem revisions are still in the DAG is not a problem, because the dependencies make it clear to git that the old version is already incorporated into the new version, so git will never try to merge them again. But this is also not ideal, because it still doesn't allow the topic branch to be converted into a tidy patch series.

Even better, you can even rebase your changes onto the subsystem branch, preferably also retaining rebase history:

m---m---m---m---m---m---m---m  master
     \                       \
      \                       o'--o'--o'--o'--o'  subsystem
       \                     /   /   /   /   / \
        --------------------o---o---o---o---o   \
                                             \   \
                                              \   *'--*'--*'  topic
                                               \ /   /   /
                                                *---*---*

Whether you retain history during this rebase or not, the rebase is simple because git can figure out for itself what range of revisions to rebase. The result is also a well-formed repository, and will not create any problems for other developers who might have based their work on your topic branch. And when you are ready, you can submit the patches "*'-*'-*'" as-is to the subsystem maintainer.

Remember that just because the old and new versions of rebased revisions are included in the DAG doesn't mean that they have to be displayed in the UI all of the time. If there were a way to mark the old versions "uninteresting", then the "interesting" part of the DAG could be displayed just like in the rebase-without-history case:

m---m---m---m---m---m---m---m  master
                             \
                              o'--o'--o'--o'--o'  subsystem
                                               \
                                                *'--*'--*'  topic

but the retention of the hidden pre-rebase commits enables future rebases and merges to be carried out smoothly.

[Modified 2009-08-16 based on discussion the git mailing list; thanks especially to Björn Steinbrink for his comments.]

See also my related posts:

No comments: