epx.com.br
 

The many faces of git rebase

I am not writing this because I am a git master (I am not) but to point out a thing that I took an unreasonable amount of time to understand. Basically, I am going to show that "git rebase" serves many completely different roles instead of one:

a) rebasing, that is, making a sequence of commits coming from several branches appear like a perfectly serialized development effort, as if it were done by a single person;

b) changing history and editing older commits;

c) "Merging" into master without adding merge commits.

First, let's see the canonical usage of rebase. Let's say someone begins with a very simple program:

int main()
{
 printf("Hello world");
}

From that point, two guys create separate development branches. The lucky one works on 'features' branch, that only adds new functionality. After a series of commits, he ends up with the following code:

int main()
{
        printf("Hello world !\n");
        printf("This is my first program\n");

        return 0;
}

The commit log of 'features' branch is:

Added another message 030943f4052e8f5421baf59a3ce68be1ffb8ba17
Added return value fcc6d6a6f72192ad49bd925319746a96a73ce498
Added exclamation point and newline to msg ad184928f92ca8b347313cbe436970630493b36f

The unlucky programmer was charged with the 'bugfixes' branch, the boring stuff, and after some bugfixes he has the following code:

#include <stdio.h>

int main(int argc, char *argv[])
{
        printf("Hello world\n");
}

And his commit log is:

90d1d91d55a4da36c1f27f625119c4c24db33d8e Fixed main() prototype
c244d58951a8683e8f5958d1307e153daba902e8 Added newline
a50cde08b410ba472c567e6da9bf5b2f29b9bd60 Added include file

Ok, now we would like to merge both branches back into 'master' branch. We could use 'git merge' from master branch:

$ git merge bugfixes
...
$ git merge features/Users/epx/art.merge $ git merge features
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Automatic merge failed; fix conflicts and then commit the result.
$ vi main.c
$ git commit -a

The problem that many developers see in this, is that non-trivial merges (that have conflicts) create new commits by themselves, which make the log "dirty":

commit 0bcbbf575f08986d0ea89aabf348c3bc008fb618
Merge: a50cde0 ad18492
Author: Elvis Pfutzenreuter 
Date:   Tue Jun 8 17:31:36 2010 -0300

    Merge branch 'features'
    
    Conflicts:
        main.c

The "git pull" command does the same thing. Sometimes this is the only way to go, but if the original developers can still be reached, we can ask one of them to rebase his own patches.

Let's adopt the 'bugfixes' patches first, and then ask 'features' to rebase against newest master:

$ git checkout bugfixes
$ git rebase master # just to be sure
$ git checkout master
$ git rebase bugfixes # "merging"
First, rewinding head to replay your work on top of it...
Fast-forwarded master to bugfixes.
$ git checkout features
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: Added another message
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Failed to merge in the changes.
Patch failed at 0001 Added another message

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase --skip".
To restore the original branch and stop rebasing run "git rebase --abort".

The conflict does not go away by itself, but in this case we fix the conflict and continue, and the original patch "Added another message" will be itself changed:

$ vi main.c
$ git add main.c
$ git rebase --continue

Another conflict will happen in patch "Added exclamation point..." and we fix the same way. In the end, the 'features' patches will be on top of the master/bugfixing patches, as if the new features developer had waited for the bug fixes to be ready, before he did anything new.

Sometimes git can work out these conflicts by itself; human intervention is called only when two or more branches touched the same line of code and did different things on it.

Now the master can 'merge' from features too:

$ git rebase features
First, rewinding head to replay your work on top of it...
Fast-forwarded master to features.

The final log in master is:

Fixed main() prototype 90d1d91d55a4da36c1f27f625119c4c24db33d8e
Added newline c244d58951a8683e8f5958d1307e153daba902e8
Added include file a50cde08b410ba472c567e6da9bf5b2f29b9bd60
Added another message adf09430af144d3b6c14d5fc39ce26a867d2bd61
Added return value 44e0094853e2cf3c949cfa8a34665342309664e1
Added exclamation point and newline f999aa837539a7301b16534ca59a4c6ecc102deb

The master log is a perfectly serialized development history that anybody can follow.

Note that all feature-related commits have different SHA-1 signatures, while the bugfix commits have retained the original signatures. This is because feature commits had to be rewritten upon rebasing (since they are now changing a different main.c than the developer first worked on). The bugfix commits retained original signature because they were already based on latest master.

We have incorporated the branch commits into master using rebase, too. This sounds confusing at first; we are using the same tool to do two conceptually different things (v.g. rewriting commits and merging them into mainline).

Most seasoned GIT developers tend to see 'master' as 'untouchable', that is, nothing is ever commited into it. It only receives commits/patches from other branches, and this admission is always made through rebase, to avoid 'merge commits' that merge and pull would create.

But that's not the end of story. We can use rebase to edit a commit deeply buried in history. Of course, editing the very last patch is easy, you can use git commit --amend. But let's say that we want to edit the "Fixed main() prototype", using char **argv instead of *argv[]. What now?

There is a "stupid" way to do that: exporting all patches in text format, removing newer patches from repository, edit what you want, and re-apply the patches from text files:

$ git format-patch -n HEAD~5
0001-Added-newline.patch
0002-Added-include-file.patch
0003-Added-another-message.patch
0004-Added-return-value.patch
0005-Added-exclamation-point-and-newline.patch
$ git reset --hard HEAD~5
HEAD is now at 90d1d91 Fixed main() prototype
$ vi main.c
$ git commit -a --amend
$ git am -3 000*
Applying: Added newline
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging main.c
Applying: Added include file
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Failed to merge in the changes.
Patch failed at 0002 Added include file
When you have resolved this problem run "git am -3 --resolved".
If you would prefer to skip this patch, instead run "git am -3 --skip".
To restore the original branch and stop patching run "git am -3 --abort".

$ vi main.c
$ git am -3 --resolved

I used "git am -3" so it tries to solve conflicts, or points conflicts inside main.c if it requires human intervention.

But, as I said, this is the "stupid", albeit safe, way to do this. We can do the same using git rebase. First thing is to do

git rebase -i HEAD~6

"What? Rebasing onto a commit of my own history???" That's what I thought when I first saw the trick, and could not understand it for some time. But in this case, the "rebasing" effect is innocuous, we only use rebase because we are interested in the "-i" flag, which allows to edit history. That command will bring up a vi editor with the following contents:

pick 90d1d91 Fixed main() prototype
pick c244d58 Added newline
pick a50cde0 Added include file
pick adf0943 Added another message
pick 44e0094 Added return value
pick f999aa8 Added exclamation point and newline to Hello World msg

You can change this text to some extent, and git rebase will act accordingly. In this case, we are only interested in changing the first commit (90d1...) so we change that row to:

edit 90d1d91 Fixed main() prototype

and save it, which causes the following:

Stopped at 90d1d91... Fixed main() prototype
You can amend the commit now, with

 git commit --amend

Once you are satisfied with your changes, run

 git rebase --continue

Now we are free to do that we want with the code, and all changes will go into that old commit. But we must have in mind that more recent commits will have to be rewritten because of that, and new conflicts may arise:

$ vi main.c
$ git add main.c
$ git rebase --continue

[detached HEAD 132cb13] Fixed main() prototype
 1 files changed, 1 insertions(+), 1 deletions(-)
Automatic cherry-pick failed.  After resolving the conflicts,
mark the corrected paths with 'git add ', and
run 'git rebase --continue'
Could not apply a50cde0... Added include file

As said it could happen, a more recent commit did conflict with the edited version of the older commit. We need to solve the conflict manually and move on:

$ vi main.c
$ git add main.c
$ git rebase --continue

[detached HEAD 3283fda] Added include file
 1 files changed, 2 insertions(+), 0 deletions(-)
Successfully rebased and updated refs/heads/master.

The interactive rebasing is not limited to edit a single commit. We can mark multiple commits for editing, change the order of commits in history, 'squash' a commit (fusing it with the previous, so it disappears from history). If you remove a commit line, the commit itself will be removed and forgotten (beware!).

So, there are no excuses not to present a clean and logical development history :)

It is a very bad idea to edit an "old" patch that has been already sent upstream (or squash, or change order), because in that case your history will no longer match the remote repository's. It is a trick to be used in your private branch and before integration, only.

CAUTION: several people I know, including guys very proficient in Git, have complained about "git ate two days worth of my work!". It is easy to make errors in rebasing and reset --hard, and you may end up losing commits.

It has happened to me 3 or 4 times already. Luckily, in all times I had the patches saved in text format (git format-patch) or they had been pushed to a remote repository, so I could fetch them back. Make sure you have some form of backup copy before you play with rebasing!


blog comments powered by Disqus