Friday, December 28, 2012

The conflict frontier of a nightmare merge

If you use feature branches in git, you know the importance of getting features (or featurettes) merged back to master as soon as possible. Otherwise, it is all too likely that the branch and master will grow so far apart that merging them becomes a nightmare.  You probably know that sick feeling in the pit of your stomach when you attempt a merge, only to be confronted with dozens or even hundreds of conflicts.  How can you possibly understand all of the conflicts, let alone disentangle them?  Sometimes it seems like there is nothing to do but abandon the branch entirely and start over.

Exactly this situation happened recently at $WORK, and it got me thinking about what can be done to attack the nightmare merge.

Explode the problem

The first step to solving a problem is to understand it, and (for me at least) the first step to understanding a problem is to visualize it somehow.  What does a nightmare merge look like?

Specifically, one doesn't learn very much from the fact that two large and widely-diverged branches don't merge successfully.  Let's explode the merge into many tiny, simpler merges to try to get a more intuitive idea of the situation.

Assume, for the sake of simplicity, that both master and the feature branch have developed linearly since they bifurcated [1]:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
     \
      A - B - C - D - E - F - G - H - I                ← branch

Now, the fact that master and branch cannot be merged means that somewhere among the commits on master there are one or more changes that conflict with one or more changes somewhere on branch.  But the majority of the changes probably do not conflict.  Let's find the problematic pairs so that we can focus our attention on them.

We start by drawing the above diagram a little bit differently, with the branch drawn downwards at a 90 degree angle to master:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

Now let's divide and conquer by merging individual pairs of commits, one at a time.  The first merge is between the first post-bifurcation commits on branch and master, "1" and "A":

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |
    A - A1
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

If we're lucky, the merge will go through without conflicts.  So let's merge some more branch commits with the first commit from master:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |
    A - A1
    |   |
    B - B1
    |   |
    C - C1
    |   |
    D - D1
    |   |
    E - E1
    |   |
    F - F1
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch

Woohoo!  Nothing on the branch conflicted with commit 1 from master. This is a little bit of progress.  For example, it means that we could probably cleanly rebase the branch to commit "1" (though that is not recommended).

Aside:

Please note that we haven't actually defined what it means for a merge to be "successful".  The simplest approach is simply to attempt the merge in git and see if it reports a conflict.  This is the approach that I will use.  But please note that even a merge that is textually successful is not necessarily correct: it might not build, or the merge might have introduced bugs, etc. It would be possible to define success more strictly, using criteria like

  • The merged source tree compiles without errors
  • The test suite runs correctly
  • The merged code passes manual inspection

Blocked!

Now let's start successively merging the first column of merge commits with commit 2 from master:

o - 0 - 1  - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |    |
    A - A1 - A2
    |   |    |
    B - B1 - B2
    |   |    |
    C - C1 - C2
    |   |    |
    D - D1 - D2
    |   |    |
    E - E1 - E2
    |   |    |
    F - F1 - x
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch

This time, we hit a problem: there was a conflict (marked with "x") when trying to merge commit "F1" with commit "E2".  What does this mean?

Commit "E1" essentially contains all of the changes from commits "A"-"E" on the branch plus the change from commit "1" on master.  This merge was successful.  If we denote the change introduced by commit "X" as "∆X", we can write (schematically):

E1 = 0 + ∆1      + ∆A + ∆B + ∆C + ∆D + ∆E           (successful)

"E2" contains everything in "E1", plus additionally it contains the change that was originally made in master commit "2".  Similarly, "F1" contains all of the changes from "E1", plus additionally the change that was originally made on branch commit "F".  Both merges "E2" and "F1" were also successful:

E2 = 0 + ∆1 + ∆2 + ∆A + ∆B + ∆C + ∆D + ∆E           (successful)
F1 = 0 + ∆1      + ∆A + ∆B + ∆C + ∆D + ∆E + ∆F      (successful)

But there was a conflict when merging "E2" and "F1":

F2 = 0 + ∆1 + ∆2 + ∆A + ∆B + ∆C + ∆D + ∆E + ∆F      (conflict)

How do we understand this conflict?  If we compare "E2", "F1", and "F2", we see that "F2" was the first time that the changes introduced in commit "2" came together with the changes introduced in commit "F". So the fact that "F2" conflicts indicates that a change made in commit "2" conflicts with a change made in commit "F".

Conflict breeds more conflict

Once there is a merge conflict, it blocks our progress.  If merge "F2" fails, then there is no way that we can merge "F2" and "G1" to create "G2" [2].  Nor is it possible to merge "F2" with "E3" to create "F3". Continuing this line of reasoning, we conclude that the one failed merge at "F2" implies that all of the merges in the rectangle below and to the right of it will also fail.  So let's add that information to the diagram:

o - 0 - 1  - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |    |
    A - A1 - A2
    |   |    |
    B - B1 - B2
    |   |    |
    C - C1 - C2
    |   |    |
    D - D1 - D2
    |   |    |
    E - E1 - E2
    |   |
    F - F1   x   x   x   x   x   x   x   x   x   x
    |   |
    G - G1   x   x   x   x   x   x   x   x   x   x
    |   |
    H - H1   x   x   x   x   x   x   x   x   x   x
    |   |
    I - I1   x   x   x   x   x   x   x   x   x   x

    ↑
  branch

Filling the rest of the diagram

However, that doesn't mean that we are completely done.  It is quite possible that a merge of "3" and "A2" will still be successful, and in general we can keep filling in the table in other directions until they are blocked by other merge conflicts.  The resulting diagram might look like this:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   x     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch

The conflict frontier

What does a diagram like this tell us?

Even when it is impossible to merge a branch back to master directly, it is usually possible to merge various parts of the branch with parts of master.  This is logical; not every change on the branch necessarily conflicts with every change on master.

But there is a "conflict frontier" between the merges that are successful and the merges that conflict.  This frontier has a very particular shape: the conflicted region is in the bottom-right corner of the diagram, and it consists of rectangles jutting towards the upper-left.

The apexes of these rectangles (i.e., "F2", "C7", and "B9" in our example) indicate the earliest pairs of changes, one on each branch, that conflict with each other (i.e., "F" vs. "2", "C" vs. "7", and "B" vs. "9").

In future articles I will present an algorithm for efficiently mapping out the conflict frontier, show some examples of real-world merge diagrams, and discuss how to use incremental merging to attack the nightmare merge.

Notes:

[1]

There are two obvious ways to generalize this approach when the histories of master and branch are not linear:

  1. Merge the individual commits anyway, preserving the topology of each branch as you go.  I think this should work, though I haven't actually tried it.  But in any case, it would be difficult to visualize the result.
  2. Apply the described procedure to only the main backbone of commits on each branch, namely the commits along the first-parent chain from each branch tip to the branch point. Using git, this would correspond to the output of git log --first-parent.  This is the approach that I will take in future articles.
[2]Strictly speaking, even though merge "F2" failed, it might still be possible to merge "G" and "2".  For example, commit "G" might have reverted commit "F", meaning that "G2" is textually identical to the successful merge "E2".  But in the usual case, the failure of one merge means that all merges below and to the right of it will fail, too.

No comments: