Sunday, August 9, 2009

Rebase with history -- implementation ideas

In my last blog posts, I discussed and idea for retaining history when rebasing and a concrete example of how it improves on traditional rebasing without history. In this third post in the series, I would like to discuss how rebase with history could be implemented.

First, a reminder of what a simple rebase-with-history looks like:

M0----M1----M2           M0----M1----M2
  \                        \           \
   \               -->      \           B1'---B2'
    \                        \         /     /
     B1----B2                 -------B1----B2

The work B1+B2 on a feature branch as been rebased onto the current version M2 of the master branch, producing two rebased revisions B1' and B2'. A traditional rebase-without-history would discard commits B1 and B2, whereas the point of rebase-with-history is to retain them.

A rebase-with-history can be done manually, today, in any of the DAG-based DVCSs. The algorithm is basically:

  1. Ensure that the old parent of the rebased branch (M0 in the example above) is a direct ancestor of the new parent (M2).
  2. Create a new temporary branch at M2.
  3. For each commit that is to be rebased, merge that commit onto the temporary branch, resolving conflicts if necessary. This creates merge commits B1' then B2'.
  4. Designate the last new merge commit (B2') at the head of the temporary branch to be the new head of the feature branch.

Avoiding information overload

Earlier I argued that the full DAG resulting from a rebase-with-history contains more detail than the average user wants to see. After all, many people are happy with the current style of rebase-without-history, which completely discards B1 and B2. So how can we reduce the visual clutter without losing the historical information that is needed to support future merges? In other words, assume that the VCS wants to store the above DAG, but the typical user typically wants to see this:

M0----M1----M2
              \
               B1'---B2'

How can this dualism be implemented?

It is clear that the commits B1 and B2 have to be retained in the DAG rather than recorded in some kind of supplemental metadata. Otherwise, many VCS commands would have to be modified to be aware of this metadata and treat B1 and B2 as if they were in the DAG. It is also necessary to prevent B1 and B2 from being garbage collected. So it is simpler to retain them in the DAG proper.

It is also clear that somehow marking the pre-rebase commits B1 and B2 themselves "historical" is not correct. After all, those commits may be integral to other branches in which they are not "historical". Perhaps the rebased version of the branch will turn out to be a false start and never even be pushed upstream. So it is not the commits themselves that are "historical"; it is only their relationship with the rebased commits B1' and B2' that is "historical".

And that, of course, suggests the correct solution. Commit B1' has two parents, M2 and B1. M2 is always interesting and should be recorded as a normal parent. But B1 is interesting mainly as the historical predecessor of B1', but usually not as part of the visible history in the upstream repository. Therefore, within the B1' node, B1 should be recorded as a parent but with a special "historical" flag.

Similarly, commit B2' has two parents--B1', which is a "normal" parent, and B2, which should be recorded as a "historical" parent.

The "historical" flag would be mainly a hint to GUI-related commands that the linked-to parent can be suppressed in most user views. When walking the DAG for display purposes, "historical" parents would simply not be followed. The historical commits B1 and B2 are only connected to the branch head via "historical" parent links, so they will not be traversed. By this mechanism, the DAG of the branch is displayed just like the DAG of a rebase-without-history. Of course, if commits B1 and B2 are non-historical ancestors of other branches, they will be displayed normally when the ancestry of those branches is displayed.

The GUI-related commands should have an extra --full-history flag to override the default behavior, causing historical parents to be displayed along with non-historical parents. (Optionally, historical links could be flagged somehow to allow them to be colored differently, etc.)

CAVEAT: Please recall that this whole discussion only applies to the "feature branch tracking upstream" class of rebase operations, namely those where:

  • the order of commits in the branch is unchanged, and
  • the new location of the branch is a descendant of the old location of the branch.

However, this is a common and important class of rebases. Other types of rebases (e.g., those that change the order of commits) cannot be done with history retention, because there is no way to represent such history within a DAG model. (darcs uses different technology and therefore doesn't share this limitation.)

Feedback: I would be very happy to receive feedback about this proposal for "rebase with history". Would it be helpful? Are there corner-cases that would prevent its implementation? Where has it been discussed before? (There's nothing new under the sun, you know...) Please add comments!

[Modified 2009-08-13 to add an outline of the rebase-with-history algorithm.]

4 comments:

Unknown said...

Earlier discussions about this topic on the Git mailing list:



http://thread.gmane.org/gmane.comp.version-control.git/84959/



http://thread.gmane.org/gmane.comp.version-control.git/100533/focus=100556

Unknown said...

So (repeating the mailing list question), how's this different to topgit, pbranch, loom?

Michael Haggerty said...

bruce:

topgit, pbranch, and loom use similar techniques to those that I advocate here, but they are both more ambitious and more cumbersome to use. They require extensions to the base tools, extra commands, and explicit designation of branches that are then managed specially.

My proposal for rebase-with-history is for the DAG-based DVCSs to retain information that is already known to them that will improve their ability to deal with some common kinds of history "rewriting". The user needn't use any special commands and needn't even notice any difference whatsoever (except for easier merges and sharable rebases) unless s/he explicitly seeks out the extra history information.

Peter Arrenbrecht said...

The key idea here seems to be the suppression of uninteresting parents in the context of a descendant flagging them as such. For Mercurial, at least, I don't think this is going to help much. `hg log`, and also `hg glog`, do not really have a concept of such a context. They have, at most, a revision range to display. It would not be hard, though, to add logging commands that really trace the history from a starting revision, which could then benefit from this idea. (Disclaimer: I wrote pbranch.)