Site menu The many faces of git rebase

The many faces of git rebase

I am not writing this because I am a git master (I am not) but to point out a thing that took me an unreasonable amount of time to understand. Basically, I am going to show that "git rebase" serves many different roles:

a) rebasing, that is, making a sequence of commits coming from several branches look like a perfectly serialized development history, as if it were executed by a single person;

b) changing history and editing older commits;

c) "Merging" into master without adding merge commits.

First, let's see the canonical usage of rebase. First, a very simple program is born:

int main()
{
 printf("Hello world");
}

From this, two developers create separate development branches. The lucky one works on 'features' branch, concerned only with new functionality. After a series of commits, she ends up with the following code:

int main()
{
        printf("Hello world !\n");
        printf("This is my first program\n");

        return 0;
}

The commit log of 'features' branch is:

Added another message 030943f4052e8f5421baf59a3ce68be1ffb8ba17
Added return value fcc6d6a6f72192ad49bd925319746a96a73ce498
Added exclamation point and newline to msg ad184928f92ca8b347313cbe436

The unlucky programmer was charged with the 'bugfixes' branch. After some bugfixes, she has the following code:

#include <stdio.h>

int main(int argc, char *argv[])
{
        printf("Hello world\n");
}

And her commit log is:

90d1d91d55a4da36c1f27f625119c4c24db33d8e Fixed main() prototype
c244d58951a8683e8f5958d1307e153daba902e8 Added newline
a50cde08b410ba472c567e6da9bf5b2f29b9bd60 Added include file

Ok, now we want to merge both branches back into 'master' branch. We could use 'git merge' from master branch:

$ git merge bugfixes
...
$ git merge features/Users/epx/art.merge $ git merge features
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Automatic merge failed; fix conflicts and then commit the result.
$ vi main.c
$ git commit -a

The problem that people have with merging, is that non-trivial merges (that conflict with each other) create new commits by themselves, which make the commit history look "dirty":

commit 0bcbbf575f08986d0ea89aabf348c3bc008fb618
Merge: a50cde0 ad18492
Author: Elvis Pfutzenreuter 
Date:   Tue Jun 8 17:31:36 2010 -0300

    Merge branch 'features'
    
    Conflicts:
        main.c

The "git pull" command does the same thing. Sometimes this is the only way to go, but if the original developers can still be reached, we can ask one of them to rebase her own patches.

Let's accept the 'bugfixes' patches first, and then ask 'features' team to rebase against the newest master:

$ git checkout bugfixes
$ git rebase master # just to be sure
$ git checkout master
$ git rebase bugfixes # "merging"
First, rewinding head to replay your work on top of it...
Fast-forwarded master to bugfixes.
$ git checkout features
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: Added another message
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Failed to merge in the changes.
Patch failed at 0001 Added another message

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase
--skip".
To restore the original branch and stop rebasing run "git rebase
--abort".

The conflict does not go away by itself, but in this case we fix the conflict and continue, and the original patch "Added another message" will be itself changed:

$ vi main.c
$ git add main.c
$ git rebase --continue

Another conflict will happen in patch "Added exclamation point..." and we fix it in the same fashion. In the end, the 'features' patches will be on top of the master/bugfixing patches, as if the features team had waited for the bugfix team.

Sometimes git can work out these conflicts by itself; human intervention is called only when two or more branches touched the same line of code and did different things on it.

Now the master can 'merge' from features too:

$ git rebase features
First, rewinding head to replay your work on top of it...
Fast-forwarded master to features.

The final log in master is:

Fixed main() prototype 90d1d91d55a4da36c1f27f625119c4c24db33d8e
Added newline c244d58951a8683e8f5958d1307e153daba902e8
Added include file a50cde08b410ba472c567e6da9bf5b2f29b9bd60
Added another message adf09430af144d3b6c14d5fc39ce26a867d2bd61
Added return value 44e0094853e2cf3c949cfa8a34665342309664e1
Added exclamation point and newline f999aa837539a7301b16534ca59a4c6ecc

The master log is now a perfectly serialized development history, that is easy to follow and understand.

Note that all feature-related commits got different SHA-1 signatures, while the bugfix commits have retained the original signatures. This is because feature commits had to be rewritten upon rebasing (since they are changing a different main.c than the developer first worked on). The bugfix commits retained the original signature because they were already based on the latest master.

We have incorporated the branch commits into master using rebase, too. This sounds confusing at first; we are using the same tool to do two conceptually different things (v.g. rewriting commits and merging them into mainline).

Most seasoned Git developers tend to see 'master' as 'untouchable', that is, nothing is ever commited into it. It only receives commits/patches from other branches, and the reception is always carried out through rebase, in order to avoid 'merge commits' that merge and pull would inevitably create.

But that's not the end of the story. We can use rebase to edit a commit deeply buried in history. Of course, editing the very last patch is easy, you can use git commit --amend. But let's say that we want to edit the "Fixed main() prototype", using char **argv instead of *argv[]. What now?

There is a "stupid" way to do that: exporting all patches in text format, removing newer patches from repository, edit what you want, and re-apply the patches from text files:

$ git format-patch -n HEAD~5
0001-Added-newline.patch
0002-Added-include-file.patch
0003-Added-another-message.patch
0004-Added-return-value.patch
0005-Added-exclamation-point-and-newline.patch
$ git reset --hard HEAD~5
HEAD is now at 90d1d91 Fixed main() prototype
$ vi main.c
$ git commit -a --amend
$ git am -3 000*
Applying: Added newline
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging main.c
Applying: Added include file
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Failed to merge in the changes.
Patch failed at 0002 Added include file
When you have resolved this problem run "git am -3 --resolved".
If you would prefer to skip this patch, instead run
"git am -3 --skip".
To restore the original branch and stop patching run
"git am -3 --abort".

$ vi main.c
$ git am -3 --resolved

I used "git am -3" so it tries to solve conflicts, and annotates unresolved conflicts within main.c, so it's clear where the human intervention is needed.

But, as I said, this is the "stupid", albeit safe and valid, way to do this. We can do the same using git rebase. First thing is to do

git rebase -i HEAD~6

"What? Rebasing onto a commit of my own history???" That's what I thought when I first saw the trick, and could not understand it for some time. But in this case, the "rebasing" effect is innocuous, we only use rebase because we are interested in the "-i" flag, which allows to edit history. That command will load a text editor with the following contents:

pick 90d1d91 Fixed main() prototype
pick c244d58 Added newline
pick a50cde0 Added include file
pick adf0943 Added another message
pick 44e0094 Added return value
pick f999aa8 Added exclamation point and newline to Hello World msg

You can change this text to some extent, and git rebase will act accordingly. In this case, we are only interested in changing the first commit (90d1...) so we change that row to:

edit 90d1d91 Fixed main() prototype

and save it, which causes the following:

Stopped at 90d1d91... Fixed main() prototype
You can amend the commit now, with

 git commit --amend

Once you are satisfied with your changes, run

 git rebase --continue

Now we are free to do that we want with the code, and all changes will be integrated into the old commit. But we must have in mind that more recent commits will have to be rewritten because of that, and new conflicts may arise:

$ vi main.c
$ git add main.c
$ git rebase --continue

[detached HEAD 132cb13] Fixed main() prototype
 1 files changed, 1 insertions(+), 1 deletions(-)
Automatic cherry-pick failed.  After resolving the conflicts,
mark the corrected paths with 'git add ', and
run 'git rebase --continue'
Could not apply a50cde0... Added include file

I said that it could happen, and indeed a more recent commit did conflict with the edited version of the older commit. We need to solve the conflict manually, and move on:

$ vi main.c
$ git add main.c
$ git rebase --continue

[detached HEAD 3283fda] Added include file
 1 files changed, 2 insertions(+), 0 deletions(-)
Successfully rebased and updated refs/heads/master.

The interactive rebasing is not limited to edit a single commit. You can mark multiple commits for editing, change the order of commits in history, 'squash' a commit (fusing it with the previous, so it disappears from history). If you remove a commit line, the commit itself will be removed and forgotten (watch out!).

So, there are no excuses not to present a clean and logical development history :)

Best is to reserve "history rewriting" tricks to your private, unpublished branches only. It is a very bad idea to edit an "old" patch that has been already sent upstream. Or squash, or change the order of commits... any change in commits already sent upstream tends to be a bad idea. Because your local private repository and the remote public repository are now telling different stories, and they are not easily reconciled. Then the options will be: forcing the local version with push -f, or merge'ing (that we had been avoiding to use so dutifully). Neither option is good if your repository is shared by many people.

CAUTION: several people I know, including guys very proficient in Git, have complained that "git ate two days worth of my work!". It is easy to make errors in rebasing or merging, and you may end up losing something. It has happened to me 3 or 4 times already. Luckily, in all times I had the patches saved in text format (git format-patch) or they had been pushed to a remote repository, so I could fetch them back. Make sure you have some form of backup copy before you play with rebasing!