Sunday, May 5, 2013

One merge to rule them all

In earlier articles, I explained how to break up a difficult merge into simple pairwise merges, how to efficiently locate the pairwise merges that block the full merge, how to make pictures of the merge conflict frontier, and how to complete a full incremental merge. What can we do with an incremental merge once it is complete?
An incremental merge, once completed, contains all of the information you could possibly want about combining two branches. Specifically, it records how to do each of the N*M pairwise merges, where N and M are the number of commits on each branch. Each of these submerges is stored as a git commit with an ancestry that correctly reflects its contents. In fact, an incremental merge might contain more information than you want to store permanently in your repository. In this article, I will explain how to use an incremental merge to derive a simple merge, a rebase (in either direction), or a rebase-with-history.
In later articles I will provide a table comparing incremental merge with simple merge and simple rebase, explain how to compute an incremental merge more efficiently, and explain git-imerge, a new tool that automates incremental merging.

The starting point

For the purposes of this discussion, we will assume that we have already completed a full incremental merge as described in my previous article and that it looks like this [1]:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11

    ↑
  branch
Remember that commits "0"-"11" are on master, and commits "A"-"I" are on the branch. (It goes without saying that there is nothing special about master; any two branches can be merged in this manner.) The rest of the commits are pairwise merges between the neighbor above and the neighbor to the left. For example, commit "C3" is the pairwise merge between "B3" and "C2", and it includes all of the changes through master commit "C" and branch commit "3".
The most interesting commit is "I11". It contains all of the changes through master commit "11" and branch commit "I". In other words, its contents are the merge of "master" and "branch". By resolving simple pairwise conflicts, we have managed indirectly to resolve the full conflict.
Incremental merging is basically a divide and conquer approach to merging. It replaces the task of resolving one big hairy merge conflict with the task of resolving many little conflicts. Since big conflicts are much harder to resolve than small conflicts, this can be a huge win, and can even make it possible to perform incrementally a merge that would be intractable in a single step.

Using the results of an incremental merge

Now that we have a complete incremental merge, how can we use the results?
In following sections I will explain how to winnow the information from an incremental merge down into any one of the following:
Obtaining one of these results is done by rewriting history to selectively discard the intermediate commits that are not wanted, while adjusting the parentage of the commits that are retained.

Simple merge

If your goal is a simple merge of branch into master, (with none of the intermediate results), then all you need from the incremental merge is commit "I11". Specifically, you need a single, new merge commit with the contents of "I11" but with commits "11" and "I" as parents:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11
    |                                                       |
    A                                                       |
    |                                                       |
    B                                                       |
    |                                                       |
    C                                                       |
    |                                                       |
    D                                                       |
    |                                                       |
    E                                                       |
    |                                                       |
    F                                                       |
    |                                                       |
    G                                                       |
    |                                                       |
    H                                                       |
    |                                                       |
    I ---------------------------------------------------- I11'  ← master

    ↑
  branch
It may be easier to recognize this diagram when it is redrawn in the conventional orientation:
o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - I11'  ← master
     \                                               /
      A -- B -- C --- D --- E --- F --- G --- H --- I       ← branch

Rebase of "branch" onto "master" or vice versa

If you prefer to rebase "branch" onto "master", what you want are the contents of commits "A11" through "I11", but with their second parents omitted to convert them from merge commits into simple, linear commits:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
                                                            |
                                                           A11'
                                                            |
                                                           B11'
                                                            |
                                                           C11'
                                                            |
                                                           D11'
                                                            |
                                                           E11'
                                                            |
                                                           F11'
                                                            |
                                                           G11'
                                                            |
                                                           H11'
                                                            |
                                                           I11'  ← branch
Similarly, it is possible to use the results to rebase "master" onto "branch" (i.e., as if the changes made on "branch" had actually been made earlier in the history of "master") by retaining commits "I1".."I11" with their first parents omitted:
o - 0
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I - I1'- I2'- I3'- I4'- I5'- I6'- I7'- I8'- I9'- I10'- I11'  ← master

    ↑
  branch

Rebase with history

It is also possible to rebase "branch" onto "master", but retaining part of the history. This is a very useful hybrid between rebasing and merging, having the advantages that it retains both the old and the new versions of the rebased commits, and that it is a valid way to rebase already-published history without the known problems of a simple rebase [2]. A rebase with history is extracted from a full incremental merge by retaining commits "A11" through "I11", but adjusting their second parents to point at "A" through "I" respectively:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |                                                       |
    A ---------------------------------------------------- A11'
    |                                                       |
    B ---------------------------------------------------- B11'
    |                                                       |
    C ---------------------------------------------------- C11'
    |                                                       |
    D ---------------------------------------------------- D11'
    |                                                       |
    E ---------------------------------------------------- E11'
    |                                                       |
    F ---------------------------------------------------- F11'
    |                                                       |
    G ---------------------------------------------------- G11'
    |                                                       |
    H ---------------------------------------------------- H11'
    |                                                       |
    I ---------------------------------------------------- I11'

                                                            ↑
                                                          branch
Analogously, it is possible to rebase "master" onto "branch" with history.

Maybe keep everything?

As we have seen, a full incremental merge is quite powerful: it contains enough information to derive all of the common rebase/merge results as special cases.
Come to think of it, why should we have to decide between merge vs. rebase at all? Why not retain all of the intermediate commits in our history? Retaining the full merge history is simple; we just reset the "master" reference to "I11" without rewriting any commits:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11   ← master

    ↑
  branch
The resulting repository holds complete information about how the individual commits were pairwise merged, which conflicts had to be resolved manually, and how they were resolved (hopefully explained in the individual log messages). It contains (as special cases) all of the information that would be recorded for a simple merge, as well as all of the information that would be retained by a rebase of "branch" on "master" or vice versa, with or without history.
However, retaining the whole history of the incremental merge, though arguably the correct approach, causes the permanent retention of a lot of intermediate commits that will--in practice--probably never be needed. Moreover, some git tools (like gitk) aren't able to display such a complicated history very well. Therefore, it is probably best to use incremental merge as a tactic to achieve a more limited result.

Still to come

In future articles I plan to present a sparse variant of the full incremental merge and describe git-imerge, a tool that automates incremental merging.

Notes:

[1]In most cases it is overkill to complete the full incremental merge, and a sparser variant would be sufficient. But it is easier to understand sparse incremental merges after we have analyzed and discussed the full version. So this article we assume that the full incremental merge is available.
[2]The problems of rebasing published work is described, for example, in the git-rebase(1) manpage section "Recovering from Upstream Rebase".

No comments: