Friday, December 28, 2012

Mapping the merge conflict frontier

 

In my last article, I described how to define a "conflict frontier" for a failed merge and conceptually how to diagram it.  In this article, I will describe an automated and efficient way to compute the conflict frontier using git.  In my next article, I will show some examples of real-world merge diagrams.

Remember, a merge diagram summarizes the combinations of commits from two branches that can be merged successfully, and which of them conflict.  The example used in the previous article looked like this:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   x     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch

In this diagram, commits "0"-"11" are on master and commits "A"-"I" are on the branch.  The rest of the diagram shows incremental merges when they are possible; e.g., commit "C3" is a successful merge between commit "C2" and commit "B3", and it contains the logical changes that were originally introduced by commits "0"-"3" on master plus those introduced by commits "A"-"C" on branch.  When an incremental merge failed, the diagram shows an "x".  The "conflict frontier" is the border between merges that are successful and those that result in conflicts.

This article describes an algorithm for computing the merge frontier efficiently.

Assumptions

In this article we make the following assumptions:

  1. If a merge can be done directly, then all of the incremental merges above and to the left of the merge can also be done successfully. For example, if we can merge "C" and "3" directly, then we assume that all of the merges "A1", "A2", "A3", "B1", "B2", "B3", "C1", and "C2" could also be done successfully.  This is typically (though not always) the case in the real world.
  2. If a direct merge conflicts, then all of the incremental merges below and to the right of it would also conflict.  For example, if a merge of "F" and "2" fails, we assume that the whole rectangle of incremental merges bounded by "F2", "F11", "I11", and "I2" would also conflict.  Again, this is typically (though not always) the case in the real world.
  3. We use "git merge" to define whether a merge is successful.  If the command completes without an error, then we assume that the merge is good.  (For a more accurate determination of which merges are successful, one could also check whether the merged version builds and passes the project's automated tests.)

Bisection

Given the above assumptions, it is possible to locate the merge frontier in any line or column of the diagram via a process of bisection.  For example, to find the merge frontier in row "E" of the diagram:

  1. Test merging "E" and "6" -> SUCCESS.  Merge frontier must be to the right of this entry.
  2. Test merging "E" and "9" -> CONFLICT.  Merge frontier must be to the left of this entry.
  3. Test merging "E" and "7" -> FAIL.  Merge frontier must be between "E6" and "E7".

In the diagram, the bisection look like this:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E                            E6   x         x
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

Filling

Given the results of a bisection, assumptions 1 and 2 allow us to fill in big chunks of the merge diagram.  In particular, since merge "E6" was successful, all of the merges in the rectangle above and to the left of "E6" are also successful.  And since "E7" conflicts, all of the merges in the rectangle below and to the right of "E7" are also conflicts:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |
    A - A1 - A2 - A3 - A4 - A5 - A6
    |   |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |
    F                                 x    x    x     x     x
    |
    G                                 x    x    x     x     x
    |
    H                                 x    x    x     x     x
    |
    I                                 x    x    x     x     x

    ↑
  branch

The combination of bisection and filling allows the merge frontier to be computed very efficiently.

The zig-zag method

There are many ways to map out the frontier using bisection and filling.  The one that I use I call the zig-zag method, because it starts at the bottom-left and zig-zags up the frontier until it reaches the top-right.  In pseudo-code, it is roughly:

i1 = 0
i2 = number of commits on branch
while True:
    i1 = index of first failing merge in row i2-1
    mark block above and to the left of (i1,i2) as "successful"
    if i1 == number of commits on master:
        break

    i2 = index of first failing merge in column i1
    mark block below and to the right of (i1,i2) as "conflict"
    if i2 == 0:
        break

Whenever the "index of first failing merge" is needed for a row or a column, it is found by bisection.  For our example, the progression looks like this (with the position (i1,i2) marked as *):

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |
    A - A1
    |   |
    B - B1
    |   |
    C - C1
    |   |
    D - D1
    |   |
    E - E1
    |   |
    F - F1
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1   *

    ↑
  branch


o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |
    A - A1
    |   |
    B - B1
    |   |
    C - C1
    |   |
    D - D1
    |   |
    E - E1
    |   |
    F - F1   *    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch


o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |
    A - A1 - A2 - A3 - A4 - A5 - A6
    |   |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   *
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch


o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |
    A - A1 - A2 - A3 - A4 - A5 - A6
    |   |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   *    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch


o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   *
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch


o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   *     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch


o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11   *
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   x     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch

Using this algorithm, the whole diagram can be computed by doing only O(B * lg max(M,N)) test merges, where B is the number of blocks in the resulting diagram and M and N are the number of commits on master and branch, respectively.  In the worst case, B = min(M,N) and the algorithm scales like O(min(M,N) * lg max(M,N)).

I have written an example program, git-mergemate, that computes merge diagrams using the algorithm described here.  To use it, put it in your $PATH, switch to your git project working directory (which must be clean), and type

git-mergemate diagram master...branch >diagram.ppm

The output is a PPM file in which successful merges are displayed as green pixels and conflicting merges are displayed as red pixels. Pixels representing merges that were actually tested are displayed as bright red/green; the other pixels are inferred using the assumptions above.

My next article will show some examples of real-world merge diagrams.

Some technical details

  1. Each of the bisect steps only needs to search the part of the row/column that hasn't already been filled in.
  2. If the merges are done using git, it is important to do them with git's "rerere" feature turned off.  Otherwise the results of test merges are not deterministic because they depend on what information "rerere" has already gathered.

No comments: