SoftwareSwirl

Monday, September 12, 2022

Beyond three-way merge

In this article, I explain three-way merge, scalar merge, and Git's "recursive merge" algorithm. I analyze how recursive merge fares in various simple criss-cross merge scenarios, and find several cases where it produces questionable results or unnecessary conflicts.

Background

This article is going to be for the real merge nerds. To understand it, you will have to understand what a merge base is and why it is significant, and possibly also something about git-imerge and incremental merge diagrams.

A big part of using version control in general, and Git in particular, is merging branches together. Consider the following Git history, where letters and numbers denote commits and time progresses from left to right (→):

---0---1---2---3        ← main
    \
     A---B---C          ← branch

If the changes that have been made on branch (A, B, and C) are OK, then the next task is to merge branch into main. The problem is that other work (1, 2, and 3) has been added to main since the work on branch was started. So the changes on main and the changes on branch have to be merged together, producing a merge commit M:

---0---1---2---3
    \           \
     \           M
      \         /
       A---B---C

Usually, this is done using the three-way merge algorithm. Three-way merge considers only three states of the project: the tip of each branch (i.e., 3 and C), and the "merge base". The merge base is, roughly speaking, the most recent that was on both of the branches that are being merged. In this case, it is commit 0.

So let's simplify the diagram by omitting intermediate commits:

---0 ⋯ 3
    ⋱   \
     C - M

and then deform it to a rectangular grid, which will make it easier to recognize patterns:

0 ⋯ 3
⋮   |
C---M

Now, time progresses downwards and to the right (↘)

Scalar merge

Let's simplify the situation still further by discussing not a merge of source trees, but rather a "scalar merge". A scalar merge is a merge of "scalars". Scalars have no internal structure. They are either identical or different, and if they're different then there is no way to subdivide them. For example, if every file had only a single line, then the files would act like scalars.

Note that in real-life merges, files are (approximately) broken into hunks with differences, and the hunks are treated like scalars. So if we can get a better understanding of scalar merges, then we can apply what we learn to actual merges.

Three-way merge

How does three-way merge work? The rules for three-way merges are as follows. I will use lower-case letters to indicate the contents of files.

No change on either branch:
```
a---a       a---a
|   |   →   |   |
a---?       a---a
```
That is, if the contents didn't change on either branch, then the result of the merge is the same as the original contents.
Change on one branch (from contents a to contents b) but not on the other:
```
a---b       a---b
|   |   →   |   |
a---?       a---b

a---a       a---a
|   |   →   |   |
b---?       b---b
```
In this scenario, one of the branches changed the contents from a to b, but the other branch didn't change the contents. In this case, the result of the merge is b.

Note that these pairs of diagrams are mirror images of each other, reflected along the diagonal. A general property of three-way merge is that it doesn't care which branch is which. From now on I won't bother showing diagrams that are mirror-images of each other.

This rule is the bread and butter of three-way merge. Look for it in the examples below.
The same change on both branches:
```
a---b       a---b
|   |   →   |   |
b---?       b---b
```
In this scenario, both branches changed the contents from a to b. In this case, three-way merge selects b as the result of the merge.

This is a slightly more questionable rule than the previous one. Why was the same change made on two different branches? Often this scenario arises due to cherry-picking. Even though this rule is less robust than the previous rule, in real life there would be a lot more merge conflicts without it.

There is only one scalar merge pattern that three-way merge can't resolve:

Different changes on the two branches:
```
a---b       a---b
|   |   →   |   |
c---?       c---X    ← conflict
```
In this scenario, one branch changed the contents from a to b, but the other changed the contents from a to c. There's no way for three-way merge to resolve this merge, so it results in a conflict (which I will denote with X). In real life, conflicts have to be resolved by a human.

Criss-cross merges

But it is not always the case that a merge has a single merge base. Consider, for example, a criss-cross merge:

---0-------1---2---3---4      ← main
    \       \     /     \
     \       \   /       \
      \       \ /         \
       \       ⋅           ?  ← desired merge commit
        \     / \         /
         \   /   \       /
          \ /     \     /
           A---B---C---D      ← branch

In this scanario, part of branch has been merged into main, but also part of main has been merged into branch. This merge has two merge bases, 1 and A. What happens if we now try to merge the rest of branch (commits B and D) into main? This is tricky because there are some changes (namely 1 and A) that are on both branches, but neither merge base includes both of those changes.

By default, Git tries to perform a merge like this using "recursive merge". This algorithm first does a synthetic helper merge H between the two merge bases, 1 and A:

---0-------1---2---3---4      ← main
    \       \     ⋅
     \       \   ⋅
      \       \ ⋅
       \       H
        \     / ⋅
         \   /   ⋅
          \ /     ⋅
           A---B---C---D      ← branch

Then it computes the three-way merge between 4 and D using H as the merge base:

---0-------1---2---3---4      ← main
    \       \     ⋅     \
     \       \   ⋅       \
      \       \ ⋅         \
       \       H           M
        \     / ⋅         /
         \   /   ⋅       /
          \ /     ⋅     /
           A---B---C---D      ← branch

The helper merge H is useful because it includes the changes from commits 1 and A, but doesn't include any commits twice. So it is a reasonable choice of merge base for the final merge M.

Recursive merge

Let's simplify the criss-cross merge to its bare minimum:

---A-------B---D---F          ← main
    \       \     / \
     \       \   /   \
      \       \ /     \
       \       ⋅       ?      ← desired merge commit
        \     / \     /
         \   /   \   /
          \ /     \ /
           C---E---G          ← branch

Now let's rearrange it on a rectangular grid to make the patterns easier to recognize:

A---B---D
|   |   |
C---⋅---F
|   |   |
E---G---?

By default, Git uses the "recursive merge" algorithm to deal with merges that have multiple merge bases. How does recursive merge deal with this particular scenario? And, more importantly, does it always do the right thing?

Recursive merge first tries to create a helper merge H by doing a three-way merge of B and C using A as the merge base:

A---B---D
|   |   |
C--(H)--F
|   |   |
E---G---?

Since this is done by three-way merge, this helper merge only depends on the upper-left square:

A---B
|   |
C---?

Then it does a three-way merge between F and G, using H as the merge base, producing M:

A---B---D
|   |   |
C--(H)--F
|   |   |
E---G---M

This is also a three-way merge, this time involving the lower-right square:

H---F
|   |
G---?

Each of the merges (H and M) is done using three-way merge, following the rules from the previous section. But there is one very curious and important twist:

If the helper merge H results in a conflict, it might be that the recursive merge can succeed anyway. For example, in this scenario:

a---b---d       a---b---d       a---b---d
|   |   |       |   |   |       |   |   |
c---⋅---f   →   c--(X)--f   →   c--(X)--f
|   |   |       |   |   |       |   |   |
e---f---?       e---f---?       e---f---f

Note that the first merge conflicts, because it doesn't match any of our three-way merge patterns:

a---b
|   |
c--(X)

But recursive merge continues anyway. Now it tries to perform the three-way merge from the lower-right square:

(X)--f
 |   |
 f---?

But this is a recognized three-way merge pattern, and its result doesn't depend on what "(X)" is. So the resolution is f, because f is the value at the tip of each branch:

(X)--f
 |   |
 f---f

The bottom line is that recursive merge always resolves any criss-cross merge where the two tips have the same contents to those contents, regardless of what happened earlier:

*---*---*       *---*---*
|   |   |       |   |   |
*---⋅---f   →   *---⋅---f
|   |   |       |   |   |
*---f---?       *---f---f

Menagerie of merges

How well does recursive merge match real-life expectations?

There are some scenarios that recursive merge handles just fine. There are others where it gives questionable non-conflict results, and still others where it produces a conflict even though the user's intention was arguably possible to infer. Let's go through some examples.

Scenarios where recursive merge succeeds

This section presents several scenarios where recursive merge gives a result that seems to align with what the user probably intended:

Same change made on both branches, staggered:

a---a---b       a---a---b
|   |   |       |   |   |
b---⋅---b   →   b--(b)--b
|   |   |       |   |   |
b---b---?       b---b---b

Different changes made on each branch, staggered:
```
a---b---b       a---b---b
|   |   |       |   |   |
a---⋅---b   →   a--(b)--b
|   |   |       |   |   |
c---d---?       c---d---d
```
In this case the user has already resolved the combination of b and c, namely d, so that should be the overall result.
Change b preferred over change d:
```
a---b---c       a---b---c
|   |   |       |   |   |
a---⋅---c   →   a--(b)--c
|   |   |       |   |   |
d---b---?       d---b---c
```
In this case the user has already decided that when merging b and d, the result b is preferred. So the result of applying change c to b should be c, just as it is in the first row.
Same resolution on both branches. Note that in this case the helper merge conflicts, but the final merge nevertheless works out as hoped because it matches the third three-way merge pattern:
```
a---b---b       a---b---b
|   |   |       |   |   |
c---⋅---d   →   c--(X)--d
|   |   |       |   |   |
c---d---?       c---d---d
```

Same final state on both branches:

a---b---d       a---b---d
|   |   |       |   |   |
c---⋅---f   →   c--(X)--f
|   |   |       |   |   |
e---f---?       e---f---f

Scenarios where recursive merge gives questionable results

There are some scenarios where recursive merge gives an answer (i.e., not a conflict), but it is not clear that the answer is the correct one:

Change b made then reverted on one branch; kept on the other branch. Note that there are two patterns in this family:
```
a---b---a       a---b---a
|   |   |       |   |   |
a---⋅---a   →   a--(b)--a
|   |   |       |   |   |
b---b---?       b---b---a

a---b---a       a---b---a
|   |   |       |   |   |
b---⋅---b   →   b--(b)--b
|   |   |       |   |   |
b---b---?       b---b---b
```
Worryingly, recursive merge gives different answers for the two scenarios, even though they are logically pretty similar. In both of these scenarios, the pre-existing merges are consistent with three-way merges. Indeed, they might have been done automatically without any user attention. But they each use the following, less reliable three-way merge rule for one of those pre-existing merges:
```
a---b
|   |
*   *
|   |
b---b
```
There are two plausible narratives here:
- Maybe change b was implemented then reverted on one branch, but then it was independently implemented on the other branch, possibly for another reason. In this case, it would be correct to keep b as the merge resolution.
- Maybe change b was cherry-picked from one branch to the other, then on one branch it was discovered that the change was bad, so it was reverted. In that case, one would want the revert back to a to be used as the merge resolution.
Since the correct result depends on facts that can't be known to the merge algorithm, it would probably be better to treat this as a merge conflict and ask the user to intervene.
Same change made then reverted on both branches:
```
a---b---a       a---b---a
|   |   |       |   |   |
b---⋅---b   →   b--(b)--b
|   |   |       |   |   |
a---b---?       a---b---b
```
In this case, arguably since change b was made then reverted on each of the branches, then the user merges are not needed, and the final result should be a. An alternative interpretation (which I think is less persuasive) is that the pre-done merges in both cases favored contents b over a, so the final result should also favor b. I think that the result should either be a conflict or a, whereas recursive merge gives b.

Scenarios where recursive merge unnecessarily (?) produces conflicts

There are also cases where the intention of the user seems relatively clear, but the recursive merge strategy produces conflicts. In some of these cases, it could be that git could do better than the recursive merge strategy, requiring the user to resolve fewer conflicts manually.

Most of these scenarios involve reverted changes:

Change b was made and reverted on one branch; a different change was made on the other branch:

a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c--(X)--c
|   |   |       |   |   |
c---d---?       c---d---X

Arguably, since b was reverted, merge d is irrelevant: it only shows how to merge b and c, but we don't want b in the final result. So the correct resolution in this scenario should be c:

a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c---⋅---c
|   |   |       |   |   |
c---d---?       c---d---c

Different changes made and reverted on each branch:

a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c--(X)--c
|   |   |       |   |   |
a---b---?       a---b---X

This scenario is similar to the previous case, but now both user-supplied merges are irrelevant because they merge changes b and c respectively, neither of which we want in the final result. Arguably, the result should be a:

a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c---⋅---c
|   |   |       |   |   |
a---b---?       a---b---a

Change b made and reverted on one branch, but it had no effect on the other branch:
```
a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c--(X)--c
|   |   |       |   |   |
d---d---?       d---d---X
```
In this scenario, change b was made then reverted on one branch, meaning that it should probably have no effect on the result. Moreover, when it was merged with the changes c and d made on the other branch, it was also ignored in favor of the other branch's tip d. So arguably, the correct resolution of this merge should be d:
```
a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c---⋅---c
|   |   |       |   |   |
d---d---?       d---d---d
```
Change b made and reverted on one branch:
```
a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c--(X)--c
|   |   |       |   |   |
d---e---?       d---e---X
```
This is a slightly less clear extension of the previous case. In this scenario, change b was made and reverted on one branch. Therefore, it, and merge e, are both irrelevant to the final result, and the correct resolution should arguably be d:
```
a---b---a       a---b---a
|   |   |       |   |   |
c---⋅---c   →   c---⋅---c
|   |   |       |   |   |
d---e---?       d---e---d
```

The example that motivated this study

There is one more interesting scenario, where recursive merge produces a conflict, but the correct merge resolution is pretty clear. This does not involve any reverts. This is the example that motivated this whole investigation:

The helper merge conflicts, but the exact same conflict has already been resolved by the user in one of the pre-existing merges:
```
a---b---d       a---b---d
|   |   |       |   |   |
c---⋅---e   →   c--(X)--e
|   |   |       |   |   |
c---f---?       c---f---X
```
In this case, recursive merge is unable to compute the helper merge because of a conflict between the change made in b and the change made in c. But recursive merge doesn't notice that that conflict has already been resolved by the user (twice!) in the form of commits f and e. Using this fact, we can fill in the middle blank:
```
a---b---d       a---b---d
|   |   |       |   |   |
c---⋅---e   →   c--(f)--e
|   |   |       |   |   |
c---f---?       c---f---?
```
Now note that merge e already includes all of the changes from both branches: b, c, and also d. But now the lower right corner can be done by three-way merge:
```
a---b---d       a---b---d
|   |   |       |   |   |
c--(f)--e   →   c--(f)--e
|   |   |       |   |   |
c---f---?       c---f---e
```
Therefore, e can be taken as the correct resolution in this scenario, and no conflict need be resolved by the user.

Seven-way merge

I think that these criss-cross scenarios could theoretically be handled better by a new "Seven-way merge algorithm" that takes into account the patterns of seven commits that can appear in a merge with two merge bases. It would match those commits to a table of seven-way merge rules, and return the appropriate result (which will sometimes be a conflict). This algorithm could be applied one hunk at a time. I think that there is a good chance that it would result in fewer incorrect merges and fewer unnecessary conflicts.

One big question is, would this approach generalize to tackling even more complicated merges? For example, what if the merge has even more than two merge bases? Can seven-way merges be composed?

How often does it come up?

Seven-way merge would probably only be useful for merges with multiple merge bases. Do these come up often enough to make the effort worthwhile?

Maybe not. I sampled a few largish repositories (see below), and only on the order of 1% of merges in those repos have multiple merge bases. And not even all of those merges are criss-cross merges (some have three or more merge bases).

On the other hand, merge conflicts are a very frustrating aspect of working with Git, and almost everybody has experienced merge conflicts that were too difficult to resolve. Moreover, when a multiple-merge-base merge fails, it can fail in a most baffling way, with conflict markers nested inside of other conflict markers, that is almost impossible to make sense out of. 7-way merge might be able to produce more understandable conflicts in some of these cases. Finally, some workflows generate more criss-cross merges than we might be accustomed to (I've heard of a very large repository in which more like 10% of merges have multiple merge bases). Is it possible that smarter handling of criss-cross merges could remove some of this frustration?

If we decide to pursue this, I would suggest the following as next steps:

Choose some public repositories with various workflows (i.e., don't just pick the best-organized projects). Perhaps include both well-maintained projects but also "run of the mill" projects. Locate merges in the history that had more than one merge base. Check if they had conflicts. Also check whether there were diff hunks that didn't conflict, but where users changed the automatic merge resolution (sometimes these will be incorrect merges).
Implement smarter handling of criss-cross merges. Instead of trying to find a clever algorithm that does the right thing in all situations, I think that it would be better to recognize some of the above patterns using a table that also tells how to handle the corresponding scenario. At first this can be a hacky, inefficient implementation.
Apply the standard recursive merge algorithm and the new criss-cross merge algorithm to the sample of real-life merges. Compare the numbers of conflicts and the numbers of seemingly incorrect merges. Decide which of the patterns should be enabled and which should be treated as conflicts.
If the results are encouraging, implement whatever is decided on in Git, so that users get a little bit better help with their merges.

Appendix: numbers of merge bases

What fraction of merge commits have multiple merge bases? I ran the following to count the number of non-octopus merges with each number of merge bases in the project's default branch:

for f in $(git rev-list --min-parents=2 --max-parents=2 HEAD)
do
    echo "$f" $(git merge-base --all $f^1 $f^2)
done | awk '{print NF-1}' | sort -n | uniq -c

I ran the above command in git/git, torvalds/linux, and a large proprietary repository.

Note that 7-way merge as strictly described above would only apply to merges with exactly two merge bases.

`git/git`

In git/git, 1.39% of merges have exactly two merge bases:

Merge bases	Count
0	6
1	15416
2	223
3	69
4	40
5	42
6	26
7	26
8	17
9	17
10	20
11	15
12	13
13	7
14	10
15	8
16	5
17	3
18	1
19	4
20	6
21	2
22	6
23	6
24	2
26	3
28	3
29	1
34	1
38	1
40	1
42	1
47	1
65	1

`torvalds/linux`

In torvalds/linux, 2.51% of merges have exactly two merge bases:

Merge bases	Count
0	3
1	63182
2	1634
3	212
4	81
5	37
6	21
7	17
8	9
9	5
10	7
12	2
13	1
29	1
52	1
53	1
55	1
78	1
85	1
93	2

Example large proprietary repository

In this repository, 0.88% of merges have exactly two merge bases:

Merge bases	Count
0	4
1	330375
2	2948
3	29
4	5
7	1
13	1

Monday, March 3, 2014

My secret tip for GSoC success

If you are applying for Google Summer of Code, there is one thing you can do to make yourself beloved in your project (and isn't bad advice in general, either):

DON'T WASTE OTHER PEOPLE'S TIME

I don't say this in a cantankerous, "get off of my lawn" way. I mean it literally.

Time

It is important to understand the motivations of a typical open-source developer. Most people contribute to open-source in their free time, out of joy. The time we choose to work on open source is time we could otherwise spend hanging out with friends and family, or programming for money, or reading books, or watching television, or just sleeping. We steal time from the rest of our lives because we love to code and we want to make the world a little bit better by creating great software [1].

In the open-source world, time is the coin of the realm.

Now evaluate GSoC from an OS contributor's point of view:

Google will pay a student to work full time on my project. Great!
Maybe the student will turn into a regular contributor, and stick around after GSoC is over. Awesome!
Maybe I can help the student become a better developer. Sounds good!

but on the other hand...

It will probably take as much time to mentor the student as it would take me to implement the feature myself.
I will have to spend that time mentoring (which I may or may not enjoy), rather than coding (which I love).
The student might turn out to be a dud, who sucks up lots of my time without producing anything.

Note that all of the downsides of mentoring a student have to do with its cost in time. The way to be an awesome GSoC student is to optimize the way you use your mentor's time.

Recipe for success

That brings us back to the advice of this article:

DON'T WASTE OTHER PEOPLE'S TIME

Mentors will generally be happy to give you the help you need [2]. They will point you to the documentation and code that you need to study. They will be delighted to discuss large-scale issues of code design (e.g., where do your changes fit in to the larger project?) They will be willing to review your code. All of these things are good uses of your mentor's time.

Your side of the bargain is to use the mentor's time as efficiently as possible. You need to extract the most benefit out of every interaction with your mentor.

If your mentor points you to some documentation, read it. Memorize it! Read other documentation that is nearby. Don't ever, ever, ask another question that is covered by that piece of documentation. If the documentation is unclear, it is OK to ask for a clarification, but then fix the documentation so that your mentor never has to answer the same question again.
If your mentor points you to some code, study it. Understand it. Figure out not just how but also why it does what it does. Read the other code that calls it, and the other code that it calls. Run git blame on the code and read the commit messages from when the code was implemented. If you think you have found a problem or bug in the code, fix it if you can. If you can't fix it, then at least investigate it thoroughly and write a complete bug report to the mailing list explaining what you think is wrong and where you got stuck. Add a test to the project's test suite that demonstrates the problem.
Don't waste your mentor's time on trivia. Follow your project's coding guidelines about submitting patches, code formatting, etc. Make your code look like existing code! If your mentor points out an error, no matter how trivial you might think it to be, fix the error and never make the same error again [3].
Help your mentor save time in other areas. Get involved in the project like any other developer. Help review other people's patches. Help answer user questions on the mailing list or on IRC. Write documentation and tests. Improve the project's website.

None of this is to say that you shouldn't ask for help when you need it. For example, it is a good idea to describe your plans and ask for feedback before you start coding--such high-level interactions are a very effective use of your mentor's time and will actually save time later. But remember that you have a finite budget of your mentor's time, so use it wisely!

Summary

Always keep in mind your role as a GSoC student: you are an apprentice [4]. This means you will be expected to spend some time running errands and sweeping floors. In return, you will have a great opportunity to learn from the world's best developers on a one-on-one basis, get invaluable experience, and have lots of time doing what you love--writing great code!

Notes

[1]	By the way, if these are not the reasons that you want to participate in GSoC, then maybe you are in the wrong place to begin with.

[2]	Indeed, you have the right to politely insist that your mentor gives you the time that you need. After all, your mentor has voluntarily signed up with the GSoC to be a mentor, and mentoring a student is, well, the basic obligation.

[3]

If you think that the project should actually move its curly braces to the same line as the if statement (or vice versa), you are wrong. Seasoned developers know that there are only two important things about code formatting: all code within a project must be formatted the same way, and all debates about code formatting are a waste of everybody's time.

[4]	Though ironically, you might be the only person in your project actually getting paid for your time!

Wednesday, May 15, 2013

"Obsolete markers" in Mercurial

I just heard on GitMinutes #07 that Mercurial will soon have a feature that records the relationship between old and new versions of changesets when a changeset is rewritten. This is a topic that I find very interesting, so I did some reading.

Each "obsolete marker" is a many-to-one record stating that changesets [A1,A2,...] obsolete changeset A. This information can be shared between repositories. The obsolete changeset is retained as long as any non-obsolete changesets depend on it, after which it can be discarded. (Though it wasn't clear to me how repository 1 can know whether repository 2 contains commits based on a commit that has been marked obsolete in repository 1; perhaps human-to-human communication is required?)

Unlike regular parent relationships, these markers have no implications for commit ancestry. That is, the fact that changeset A' obsoletes changeset A does not imply that commit A' includes all of the ancestors of commit A. Because of this, obsolete markers can be used even if changesets are reordered, split, or squashed--actions that cannot be represented in an ancestry-only DAG.

If I understand correctly, obsolete markers should really be thought of as relationships between changesets (i.e., the deltas introduced by commits) as opposed to the commits (and tree states) themselves. It appears that they are intended mostly for transitional use before the history has solidified, and not as a permanent part of the record.

Because obsolete markers introduce a new type of "history", they complicate the history model. There are new types of conflicts that can arise because of them, and such conflicts have to be resolved in new ways.

I'm very curious how obsolete markers pan out in practice. It's definitely a feature that is worth keeping an eye on.

Wednesday, May 8, 2013

git-imerge: a practical introduction

git-merge on steroids / git-rebase for the masses
This article is a practical introduction to incremental merging using git-imerge: why you need it, what it does, and how to use it.
(Don't fear; this article is not as long as it looks. Most of it consists of big ASCII-art illustrations and example program output.)

Why incremental merge?

The traditional alternatives for combining two branches both have serious problems.

`git merge` pain

You need to resolve one big conflict that mixes up a lot of little changes on both sides of the merge. (Resolving big conflicts is hard!)
Merging is all-or-nothing:
- There is no way to save a partly-done merge, so
  - You can't record your progress.
  - You can't switch to another branch temporarily.
  - If you make a mistake, you cannot revert only part of the merge.
  If you cannot resolve the whole conflict, there is nothing to do but start over.
- There is no way to test a partly-done merge--the code won't even build until the conflict is completely resolved.
It is difficult to collaborate on a merge with a colleague.

`git rebase` pain

It is very problematic to rebase work that has been published.
- Rebasing is unfriendly to collaboration.
You have to reconcile each of the branch commits with all of the changes on master.
Rebasing is all-or-nothing:
- It is awkward to interrupt a rebase while it is in progress: you're not even on a branch, and rebase doesn't preserve the relationship between old commits and new.
Rebasing discards history (namely the old version of the branch).
Rebasing often requires similar conflicts to be resolved multiple times.

Incremental merge

git-imerge implements a new merging method, incremental merge, that reduces the pain of merging to its essential minimum. git-imerge:

Presents conflicts pairwise: you only ever need to resolve one commit from each branch at a single time.
- Small conflicts (much easier to resolve than large conflicts).
- You can view commit messages and individual diffs to see what each commit was trying to do.
Records all intermediate merges with their correct ancestry, as commits in your repository. An incremental merge that is in progress
- ...can be interrupted.
- ...can be pushed to a server.
- ...can be pulled by a colleague, worked on, and pushed again.
Never shows the same conflict twice. Once a conflict has been resolved, it is stored in the DAG to make future merges easier.
Lets you test every intermediate state. If there is a problem, you can use "git bisect" to find the exact pairwise merge that was faulty. You can redo that merge and continue the incremental merge from there (retaining earlier pairwise merges).
Is largely automated and surprisingly efficient.
Allows the final merge to be simplified for the permanent record, omitting the intermediate results.

The basic idea

Suppose you want to merge "branch" into "master":

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
     \
      A - B - C - D - E - F - G - H - I                ← branch

First draw the diagram a bit differently:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

Now start filling it in, merging one pair at a time:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |
    A - A1
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

"A1" is a merge between a single commit ("1") on master and a single commit ("A") on branch.
Continue...

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |
    A - A1
    |   |
    B - B1
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

"B1" is a merge between "A1" and "B". Since "A1" already includes the changes from "1" and "A", this merge only has to add the changes from commit "B", which are hopefully limited in scope. If there is a conflict, it is hopefully small.
Most of these pairwise merges will not conflict, and git-imerge will let do them for you automatically until it finds a conflict:

o - 0 - 1  - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |    |
    A - A1 - A2
    |   |    |
    B - B1 - B2
    |   |    |
    C - C1 - C2
    |   |    |
    D - D1 - D2
    |   |    |
    E - E1 - E2
    |   |
    F - F1   X
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch

Now you have to help resolve the conflict that arose at "X" when merging "E2" and "F1". But Git knows that these two commits share "E1" as a common ancestor ("merge base"), so all you have to do is resolve the change originally made in master commit "2" with the change that was made in branch commit "F":

o - 0 - 1  - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |    |
    A - A1 - A2
    |   |    |
    B - B1 - B2
    |   |    |
    C - C1 - C2
    |   |    |
    D - D1 - D2
    |   |    |
    E - E1 - E2
    |   |    |
    F - F1 - F2
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch

Continue in this manner until the diagram is complete:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11

    ↑
  branch

Done!

Simplifying the results

A completed incremental merge contains all of the information you could possibly want to know about combining two branches; all you need to do now is discard any information that you don't want in your permanent record. For example,

Commit "I11" is the simple merge of "branch" into "master":

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11
    |                                                       |
    A                                                       |
    |                                                       |
    B                                                       |
    |                                                       |
    C                                                       |
    |                                                       |
    D                                                       |
    |                                                       |
    E                                                       |
    |                                                       |
    F                                                       |
    |                                                       |
    G                                                       |
    |                                                       |
    H                                                       |
    |                                                       |
    I ---------------------------------------------------- I11'  ← master

    ↑
  branch

(Usually such a diagram is drawn like so:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - I11'  ← master
     \                                               /
      A -- B -- C --- D --- E --- F --- G --- H --- I       ← branch

The rightmost column is the rebase of "branch" onto "master":

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
                                                            |
                                                           A11'
                                                            |
                                                           B11'
                                                            |
                                                           C11'
                                                            |
                                                           D11'
                                                            |
                                                           E11'
                                                            |
                                                           F11'
                                                            |
                                                           G11'
                                                            |
                                                           H11'
                                                            |
                                                           I11'  ← branch

The bottommost row is the rebase of "master" onto "branch":

o - 0
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I - I1'- I2'- I3'- I4'- I5'- I6'- I7'- I8'- I9'- I10'- I11'  ← master

    ↑
  branch

If you have already published branch, consider converting this into a rebase with history:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |                                                       |
    A ---------------------------------------------------- A11'
    |                                                       |
    B ---------------------------------------------------- B11'
    |                                                       |
    C ---------------------------------------------------- C11'
    |                                                       |
    D ---------------------------------------------------- D11'
    |                                                       |
    E ---------------------------------------------------- E11'
    |                                                       |
    F ---------------------------------------------------- F11'
    |                                                       |
    G ---------------------------------------------------- G11'
    |                                                       |
    H ---------------------------------------------------- H11'
    |                                                       |
    I ---------------------------------------------------- I11'

                                                            ↑
                                                          branch

Rebase-with-history is a useful hybrid between rebasing and merging, having the advantages that it retains both the old and the new versions of the rebased commits, and that it is a valid way of rebasing already-published history without the known problems of a simple rebase.

Efficiency

In most cases git-imerge does not have to construct the full incremental merge. It uses an efficient algorithm, based on bisection, for filling in large blocks of the incremental merge quickly and homing in on the conflicts. A typical in-progress merge might look like this:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A -   --   --   --   --   -- A6 -   -- A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B -   --   --   --   --   -- B6 - B7 - B8   X
    |   |    |    |    |    |    |
    C -   --   --   --   --   -- C6   X
    |   |    |    |    |    |    |
    D -   --   --   --   --   -- D6
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6
    |   |
    F - F1   X
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch

where the "X"s marks pairwise merges that are known to conflict (i.e., they need user attention), and the cross-hatched rectangles are merges that could be skipped because the merges on the boundaries of the rectangles were all successful.

git-imerge

git-imerge implements incremental merging for Git. This section shows how to use git-imerge to do a simple merge.

You begin much like with git merge: check out the destination branch and then tell git imerge what branch you want to merge into it:
```
$ git checkout master
$ git imerge start --name=merge-branch --first-parent branch
```
Multiple imerges can be in progress simultaneously, so you have to give each one a name, in this case "merge-branch".

git-imerge then uses bisection to find pairwise merges that conflict, and asks user to fix them one pair at a time:

Attempting automerge of 1-1...success.
Attempting automerge of 1-4...success.
Attempting automerge of 1-6...success.
Attempting automerge of 9-6...failure.
Attempting automerge of 5-6...failure.
Attempting automerge of 3-6...success.
Attempting automerge of 4-6...failure.
Attempting automerge of 4-1...success.
Attempting automerge of 4-4...failure.
Attempting automerge of 4-3...failure.
Attempting automerge of 4-2...success.
Attempting automerge of 9-2...success.
Autofilling 1-6...success.
Autofilling 2-6...success.
Autofilling 3-1...success.
Autofilling 3-2...success.
Autofilling 3-3...success.
Autofilling 3-4...success.
Autofilling 3-5...success.
Autofilling 3-6...success.
Autofilling 4-2...success.
Autofilling 5-2...success.
Autofilling 6-2...success.
Autofilling 7-2...success.
Autofilling 8-2...success.
Autofilling 9-1...success.
Autofilling 9-2...success.
Attempting automerge of 4-3...failure.
Switched to branch 'imerge/c-d'
Auto-merging conflict.txt
CONFLICT (add/add): Merge conflict in conflict.txt
Automatic merge failed; fix conflicts and then commit the result.

Original first commit:
commit b7fe8a65d0cee2e388e971c4b29be8c6cbb25ee1
Author: Lou User <luser@example.com>
Date:   Fri May 3 14:03:05 2013 +0200

    c conflict

Original second commit:
commit bd0373cadae08d872536bcda8214c0631e19945a
Author: Lou User <luser@example.com>
Date:   Fri May 3 14:03:05 2013 +0200

    d conflict

There was a conflict merging commit 4-3, shown above.
Please resolve the conflict, commit the result, then type

    git-imerge continue

Please note that git-imerge tells you exactly which of the original commits need to be combined in this merge, and displays their log messages.

You can get a diagram of the current state of the merge (it is in crude ASCII-art for now):

$ git imerge diagram
**********
*??|?????|
*--+-----+
*??|#?????
*??|??????
*??|??????
*--+??????

Key:
  |,-,+ = rectangles forming current merge frontier
  * = merge done manually
  . = merge done automatically
  # = conflict that is currently blocking progress
  @ = merge was blocked but has been resolved
  ? = no merge recorded

After you fix the indicated conflict and commit the result, run git imerge continue to tell git-imerge to incorporate the result and proceed to the next conflict:

$ git add ...
$ git commit
$ git imerge continue
Merge has been recorded for merge 4-3.
Attempting automerge of 5-3...success.
Attempting automerge of 5-3...success.
Attempting automerge of 9-3...success.
Autofilling 5-3...success.
Autofilling 6-3...success.
Autofilling 7-3...success.
Autofilling 8-3...success.
Autofilling 9-3...success.
Attempting automerge of 4-4...success.
Attempting automerge of 4-5...success.
Attempting automerge of 4-6...success.
Attempting automerge of 9-6...success.
Autofilling 4-6...success.
Autofilling 5-6...success.
Autofilling 6-6...success.
Autofilling 7-6...success.
Autofilling 8-6...success.
Autofilling 9-4...success.
Autofilling 9-5...success.
Autofilling 9-6...success.
Merge is complete!
$ git imerge diagram
**********
*??.?????|
*??......|
*??.*....|
*??.?????|
*??.?????|
*--------+

Key:
  |,-,+ = rectangles forming current merge frontier
  * = merge done manually
  . = merge done automatically
  # = conflict that is currently blocking progress
  @ = merge was blocked but has been resolved
  ? = no merge recorded

When you are done, simplify the incremental merge into a simple merge, a simple rebase, or a rebase-with-history:

$ git imerge finish --goal=merge

By default, this creates a new branch NAME that points at the result, and checks out that branch:

$ git log -1 --decorate
commit 79afd870a52e216114596b52c800e45139badf74 (HEAD, merge-branch)
Merge: 8453321 993a8a9
Author: Lou User <luser@example.com>
Date:   Wed May 8 10:08:07 2013 +0200

    Merge frobnicator into floobifier.

(It also deletes all of the temporary data.)

Intermediate data

During an incremental merge, intermediate results are stored directly in your repository as special references:

refs/imerge/NAME/state: A blob containing a little bit of metadata.
refs/imerge/NAME/manual/M-N: Manual merge including all of the changes through commits M on master and N on branch.
refs/imerge/NAME/auto/M-N: Automatic merge including all of the changes through commits M on master and N on branch.
refs/heads/imerge/NAME: Temporary branch used when resolving merge conflicts.
refs/heads/NAME: Default reference name for storing final results.

Project status

As of this writing, git-imerge is very new and still experimental. Please try it out, but use it cautiously (e.g., on a clone of your main Git repository) and give me your feedback!

Contributing

I have lots of ideas for additional features, and am sure that you do too. See the project TODO list for inspiration, and get involved!

For more information:

The git-imerge project home page on GitHub.

Blog articles about incremental merge

Blog articles about rebase-with-history

Sunday, May 5, 2013

Incremental merge vs. direct merge vs. rebase

	Direct merge	Rebase	Incremental merge
Basic commands	git checkout master git merge branch <resolve big conflict> git commit	git checkout branch git rebase master while not done: <resolve smaller conflict> git rebase --continue	git checkout master git-imerge start [...] branch while not done: <resolve pairwise conflict> git commit [...] git-imerge continue git-imerge simplify [...] (This uses the `git-imerge` tool [1].)
Size of conflicts	All of the changes in master have to be combined with all of the changes in branch in one go. Various changes are mixed up together, even if they were originally committed separately, which can make the conflict large or even intractable.	One step of a rebase requires one change from branch to be resolved with all of the changes from master. This is easier than doing a direct merge in one step, but still mixes up unrelated changes.	At any step, only one change from master has to be combined with one change from branch. The commit messages and deltas from each of the two changes are available to help you understand the two changes.
Effort	The least amount of effort, if there are only a few simple conflicts. Effort increases highly nonlinearly in the size of the conflict.	A bit more effort, if conflicts are simple and isolated. Conflicts tend to be smaller than for direct merge, but more numerous. If there are sequential overlapping conflicts, you might need to resolve similar conflicts multiple times.	More processing power required at start of merge. Conflicts are resolved as lots of little conflicts, so the effort scales closer to linearly with the number of pairwise conflicts [2]. Benefits decrease if original changes were not committed in small, self-contained steps.
Interruptibility	After you start a direct merge, there is no way to record partial results. You are forced to either complete the whole merge and then commit it, or abandon the whole merge using `git merge --abort`. Progress on a direct merge is all-or-nothing.	Once a rebase is started, it is awkward to interrupt it [3]. It is true that intermediate results are committed, but because the rebase discards the connection between original commits and rebased commits, you have to do the bookkeeping yourself.	Each of the pairwise merges is resolved and committed separately. After each pairwise merge is committed, you can switch branches, do something else for a while, then return later to the incremental merge. Progress on an incremental merge can be locked in in small increments.
Testability, fixability	The results of a direct merge cannot be tested until the whole merge is done. If you mess up one part of the merge, there is no easy way to revert only that part.	The individual rebased commits can be tested. But if there is a failure, it is nontrivial to figure out which of the changes on master interacted badly with the rebased commit.	Each pairwise merge can be tested individually. If one fails testing, it can be discarded and redone without having to discard earlier pairwise merges. If a failure is discovered after the incremental merge is done, it is possible to use `git bisect` to find out exactly which of the pairwise merges was faulty.
Collaboration	Since an in-progress direct merge is not savable, the only way to get help from a colleague on a direct merge is to call the colleague to your computer and have him/her work in your working copy.	It is strongly recommended that a branch that has been published not be rebased. Thus rebasing is usually applied only to local work done by a single developer. It is awkward to share the work of rebasing a branch [3].	Each pairwise merge commit is a normal commit that can be exchanged with other colleagues via push/fetch/etc. Each pairwise conflict can be resolved by the person most familiar with that part of the code. In fact, if you plan to retain the whole incremental merge in your repository history, then no history rewriting is necessary at all, and there is no problem merging a published branch, or publishing the intermediate merge commits as you go.
Recommendations	In a merge-based workflow, use direct merge for simple merges. If a difficult conflict arises, switch to incremental merge and then simplify the results into a simple merge.	In a rebase-based workflow with unpublished local changes, use rebase to update your work onto upstream changes if you only expect trivial conflicts. If you expect (or, midway through the rebase, encounter) nontrivial conflicts, switch to incremental merge and then simplify the results into a rebase.	Use incremental merge when: a simple merge or rebase is desired but the conflicts are nontrivial you would like to rebase work that has already been published (simplify results to rebase-with-history) you want to retain a permanent record of how conflicts were resolved you want to collaborate on a merge or rebase

Notes:

[1]	`git-imerge` is a new tool that I am working on to automate incremental merging. It is available as open-source from its public Git repository.

[2]	If development was truly chaotic, and especially if it involved changes that were later reverted, then an incremental merge might be much more effort than a simple merge because it requires you to resolve conflicts between changes that are obsolete.

[3]

(1, 2)

To pause and/or share the work of rebasing, add a reference to the last successful rebased commit using git branch rebase-wip before aborting the rebase with git rebase --abort. The result will look something like:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |                                                       |
    A                                                       A'
    |                                                       |
    B                                                       B'
    |                                                       |
    C                                                       C'
    |                                                       |
    D                                                       D'
    |
    E                                                       ↑
    |                                                   rebase-wip
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

Such an interrupted rebase can be continued using a command like git rebase --onto rebase-wip D branch, where D is the SHA1 of the last commit that was successfully rebased in the first attempt. The work can be shared by publishing tips branch and rebase-wip, but care should be taken when doing so to prevent other people from basing their own work on the pre-rebased branch branch.

One merge to rule them all

In earlier articles, I explained how to break up a difficult merge into simple pairwise merges, how to efficiently locate the pairwise merges that block the full merge, how to make pictures of the merge conflict frontier, and how to complete a full incremental merge. What can we do with an incremental merge once it is complete?
An incremental merge, once completed, contains all of the information you could possibly want about combining two branches. Specifically, it records how to do each of the N*M pairwise merges, where N and M are the number of commits on each branch. Each of these submerges is stored as a git commit with an ancestry that correctly reflects its contents. In fact, an incremental merge might contain more information than you want to store permanently in your repository. In this article, I will explain how to use an incremental merge to derive a simple merge, a rebase (in either direction), or a rebase-with-history.
In later articles I will provide a table comparing incremental merge with simple merge and simple rebase, explain how to compute an incremental merge more efficiently, and explain git-imerge, a new tool that automates incremental merging.

The starting point

For the purposes of this discussion, we will assume that we have already completed a full incremental merge as described in my previous article and that it looks like this [1]:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11

    ↑
  branch

Remember that commits "0"-"11" are on master, and commits "A"-"I" are on the branch. (It goes without saying that there is nothing special about master; any two branches can be merged in this manner.) The rest of the commits are pairwise merges between the neighbor above and the neighbor to the left. For example, commit "C3" is the pairwise merge between "B3" and "C2", and it includes all of the changes through master commit "C" and branch commit "3".
The most interesting commit is "I11". It contains all of the changes through master commit "11" and branch commit "I". In other words, its contents are the merge of "master" and "branch". By resolving simple pairwise conflicts, we have managed indirectly to resolve the full conflict.
Incremental merging is basically a divide and conquer approach to merging. It replaces the task of resolving one big hairy merge conflict with the task of resolving many little conflicts. Since big conflicts are much harder to resolve than small conflicts, this can be a huge win, and can even make it possible to perform incrementally a merge that would be intractable in a single step.

Using the results of an incremental merge

Now that we have a complete incremental merge, how can we use the results?
In following sections I will explain how to winnow the information from an incremental merge down into any one of the following:

Obtaining one of these results is done by rewriting history to selectively discard the intermediate commits that are not wanted, while adjusting the parentage of the commits that are retained.

Simple merge

If your goal is a simple merge of branch into master, (with none of the intermediate results), then all you need from the incremental merge is commit "I11". Specifically, you need a single, new merge commit with the contents of "I11" but with commits "11" and "I" as parents:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11
    |                                                       |
    A                                                       |
    |                                                       |
    B                                                       |
    |                                                       |
    C                                                       |
    |                                                       |
    D                                                       |
    |                                                       |
    E                                                       |
    |                                                       |
    F                                                       |
    |                                                       |
    G                                                       |
    |                                                       |
    H                                                       |
    |                                                       |
    I ---------------------------------------------------- I11'  ← master

    ↑
  branch

It may be easier to recognize this diagram when it is redrawn in the conventional orientation:

o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - I11'  ← master
     \                                               /
      A -- B -- C --- D --- E --- F --- G --- H --- I       ← branch

Rebase of "branch" onto "master" or vice versa

If you prefer to rebase "branch" onto "master", what you want are the contents of commits "A11" through "I11", but with their second parents omitted to convert them from merge commits into simple, linear commits:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
                                                            |
                                                           A11'
                                                            |
                                                           B11'
                                                            |
                                                           C11'
                                                            |
                                                           D11'
                                                            |
                                                           E11'
                                                            |
                                                           F11'
                                                            |
                                                           G11'
                                                            |
                                                           H11'
                                                            |
                                                           I11'  ← branch

Similarly, it is possible to use the results to rebase "master" onto "branch" (i.e., as if the changes made on "branch" had actually been made earlier in the history of "master") by retaining commits "I1".."I11" with their first parents omitted:

o - 0
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I - I1'- I2'- I3'- I4'- I5'- I6'- I7'- I8'- I9'- I10'- I11'  ← master

    ↑
  branch

Rebase with history

It is also possible to rebase "branch" onto "master", but retaining part of the history. This is a very useful hybrid between rebasing and merging, having the advantages that it retains both the old and the new versions of the rebased commits, and that it is a valid way to rebase already-published history without the known problems of a simple rebase [2]. A rebase with history is extracted from a full incremental merge by retaining commits "A11" through "I11", but adjusting their second parents to point at "A" through "I" respectively:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |                                                       |
    A ---------------------------------------------------- A11'
    |                                                       |
    B ---------------------------------------------------- B11'
    |                                                       |
    C ---------------------------------------------------- C11'
    |                                                       |
    D ---------------------------------------------------- D11'
    |                                                       |
    E ---------------------------------------------------- E11'
    |                                                       |
    F ---------------------------------------------------- F11'
    |                                                       |
    G ---------------------------------------------------- G11'
    |                                                       |
    H ---------------------------------------------------- H11'
    |                                                       |
    I ---------------------------------------------------- I11'

                                                            ↑
                                                          branch

Analogously, it is possible to rebase "master" onto "branch" with history.

Maybe keep everything?

As we have seen, a full incremental merge is quite powerful: it contains enough information to derive all of the common rebase/merge results as special cases.
Come to think of it, why should we have to decide between merge vs. rebase at all? Why not retain all of the intermediate commits in our history? Retaining the full merge history is simple; we just reset the "master" reference to "I11" without rewriting any commits:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11   ← master

    ↑
  branch

The resulting repository holds complete information about how the individual commits were pairwise merged, which conflicts had to be resolved manually, and how they were resolved (hopefully explained in the individual log messages). It contains (as special cases) all of the information that would be recorded for a simple merge, as well as all of the information that would be retained by a rebase of "branch" on "master" or vice versa, with or without history.
However, retaining the whole history of the incremental merge, though arguably the correct approach, causes the permanent retention of a lot of intermediate commits that will--in practice--probably never be needed. Moreover, some git tools (like gitk) aren't able to display such a complicated history very well. Therefore, it is probably best to use incremental merge as a tactic to achieve a more limited result.

Still to come

In future articles I plan to present a sparse variant of the full incremental merge and describe git-imerge, a tool that automates incremental merging.

Notes:

[1]	In most cases it is overkill to complete the full incremental merge, and a sparser variant would be sufficient. But it is easier to understand sparse incremental merges after we have analyzed and discussed the full version. So this article we assume that the full incremental merge is available.

[2]	The problems of rebasing published work is described, for example, in the git-rebase(1) manpage section "Recovering from Upstream Rebase".

Git incremental merge

In earlier articles, I explained how to break up a difficult merge into simple pairwise merges, how to efficiently locate the pairwise merges that block the full merge, and how to make pictures of the merge conflict frontier. How can these insights be used to actually resolve a full, nightmare merge?

This article will describe how to incrementally resolve the conflicts in a merge diagram, one pairwise conflict at a time. Each time a conflict is manually resolved, the conflict frontier is pushed back and more of the merge diagram becomes green. When the whole conflict diagram is green, the full incremental merge is done.

The amazing thing is that an incremental merge, once completed, contains all of the information you could possibly want about combining two branches. In my next article I will show, given a full incremental merge, how to derive a simple merge, a rebase (in either direction), or a rebase-with-history.

In future articles I will summarize the differences between incremental merge, git merge, and git rebase, discuss how to reduce the amount of work needed to resolve a merge conflict, and explain git-imerge, a tool that automates the process of incremental merging.

Pushing back the conflict frontier

In an earlier article, I showed how to create a diagram of a merge that was partly blocked by conflicting pairwise merges:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   x     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |
    F - F1   x    x    x    x    x    x    x    x     x     x
    |   |
    G - G1   x    x    x    x    x    x    x    x     x     x
    |   |
    H - H1   x    x    x    x    x    x    x    x     x     x
    |   |
    I - I1   x    x    x    x    x    x    x    x     x     x

    ↑
  branch

Remember: in this diagram, commits "0"-"11" are on master and commits "A"-"I" are on the branch. The rest of the diagram shows incremental merges when they are possible; e.g., commit "C3" is a successful merge between commit "C2" and commit "B3". When a pairwise merge fails, the diagram shows an "x"; for example Git was unable to merge "E2" and "F1", which implies that there was some conflict between the changes introduced in master commit "2" and those introduced in branch commit "F".

But you are smarter than Git, so (hopefully) you can merge "E2" and "F1" manually. Among your advantages over Git is that you can understand the commit messages for commits "2" and "F", so you know what those two commits were trying to achieve. Your task is to create a commit "F2" that contains the changes made in both "2" and "F" [1]. Once you resolve those two changes and commit the results as "F2", the diagram is transformed to:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   x     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |    |
    F - F1 - F2                       x    x    x     x     x
    |   |
    G - G1                            x    x    x     x     x
    |   |
    H - H1                            x    x    x     x     x
    |   |
    I - I1                            x    x    x     x     x

    ↑
  branch

Not only has "F2" been added to the diagram, but also a white area has opened up below and to the right of "F2". These merges used to be blocked by the conflict at "F2". But since we have merged "F2" manually, it might be that Git can now do some or all of these merges automatically. For example, we can now attempt to merge "E3" and "F2" to create "F3", or merge "F2" and "G1" to create "G2", and continue filling in the white area until we are blocked by new conflicts.

If we are lucky, the conflict "F2" was isolated and the whole block is now mergeable:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8   x     x     x
    |   |    |    |    |    |    |
    C - C1 - C2 - C3 - C4 - C5 - C6   x    x    x     x     x
    |   |    |    |    |    |    |
    D - D1 - D2 - D4 - D4 - D5 - D6   x    x    x     x     x
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6   x    x    x     x     x
    |   |    |    |    |    |    |
    F - F1 - F2 - F3 - F4 - F5 - F6   x    x    x     x     x
    |   |    |    |    |    |    |
    G - G1 - G2 - G3 - G4 - G5 - G6   x    x    x     x     x
    |   |    |    |    |    |    |
    H - H1 - H2 - H3 - H4 - H5 - H6   x    x    x     x     x
    |   |    |    |    |    |    |
    I - I1 - I2 - I3 - I4 - I5 - I6   x    x    x     x     x

    ↑
  branch

Of course we might not be so lucky, and new conflicts might be revealed in the rectangle previously blocked by "F2". But either way, we have made at least some progress in pushing back the conflict frontier.

If we continue merging commits pairwise--automatically when possible, manually when necessary--eventually we will clear the whole diagram of conflicts. When this is accomplished, the incremental merge is done:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11

    ↑
  branch

But when the incremental merge is done, the full merge is also done! Commit "I11" combines all of the changes from "master" with all of the changes from "branch"; in other words, it is the merge of "master" and "branch". The merge nightmare is over.

Still to come

My next article will explain how to convert an incremental merge into a simple merge, a simple rebase, or a rebase with history. In future articles I will compare the available merge alternatives, present less exhaustive variants of the full incremental merge and describe git-imerge, a tool that automates incremental merging.

Notes:

[1]

The technical reason why merging "E2" and "F1" into "F2" is simpler than merging "2" and "F" directly is that "E2" and "F1" share nearby commit "E1" as a merge base (see the documentation for git-merge-base(1)). By contrast, the merge base of "2" and "F" is commit "0", which is further away from the commits to be merged.

SoftwareSwirl

Monday, September 12, 2022

Beyond three-way merge

Background

Scalar merge

Three-way merge

Criss-cross merges

Recursive merge

Menagerie of merges

Scenarios where recursive merge succeeds

Scenarios where recursive merge gives questionable results

Scenarios where recursive merge unnecessarily (?) produces conflicts

The example that motivated this study

Seven-way merge

How often does it come up?

Appendix: numbers of merge bases

git/git

torvalds/linux

Example large proprietary repository

Monday, March 3, 2014

My secret tip for GSoC success

Time

Recipe for success

Summary

Notes

Wednesday, May 15, 2013

"Obsolete markers" in Mercurial

Wednesday, May 8, 2013

git-imerge: a practical introduction

Why incremental merge?

git merge pain

git rebase pain

Incremental merge

The basic idea

Simplifying the results

Efficiency

git-imerge

Intermediate data

Project status

Contributing

For more information:

Sunday, May 5, 2013

Incremental merge vs. direct merge vs. rebase

Notes:

One merge to rule them all

The starting point

Using the results of an incremental merge

Simple merge

Rebase of "branch" onto "master" or vice versa

Rebase with history

Maybe keep everything?

Still to come

Notes:

Git incremental merge

Pushing back the conflict frontier

Still to come

Notes:

Blog Archive

Labels

About Me

`git/git`

`torvalds/linux`

`git merge` pain

`git rebase` pain