Git-Fu: reposurgeon

detunized profile image Dmitry Yakimenko ・3 min read

In a comment to my previous post Harvey Thompson mentioned a tool called reposurgeon that is supposed to take care of the exact problem I had (merging a bunch of Git repos). I installed it and tried it out. I won't keep you in suspense here, it didn't work for me.

I spent a bunch of time going through the giant documentation page, which seems to be both detailed and vague at the same time. It feels too formal and yet doesn't give you a good idea what the whole thing is about. There's a lot of terminology that is unique to this tool and not a lot of introduction into the lingo. There are very few examples of anything. No quick start section with trivial cases covered. Quite difficult to figure out where to start. I also didn't find a lot of resources online. Hardly any, actually. Not many people seem to use this tool.

I'm sure reposurgeon is a beast. It seems to have more commands and switches than git, openssl and mogrify combined. It has a query language that makes Oracle green with envy. But for the life of me, I couldn't figure out how to use any of it. It's just not intuitive at all. Some commands are pretty arbitrary, like append, that lets you append text to the commit message. Yet, there's no prepend.

The query language is quite obscure and doesn't resemble anything I'm familiar with. Here's an example from the docs:

define lastchange {
    @max(=B & [/ChangeLog/] & /{0}/B)? list

And the description:

List the last commit that refers to a ChangeLog file containing a specified string. (The trick here is that ? extends the singleton set consisting of the last eligible ChangeLog blob to its set of referring commits, and listonly notices the commits.)

That makes zero sense to me, though I've spent quite a bit of time trying to understand the docs.

To me this looks like a well run project. It seems to have good pace of releases, vast and detailed documentation, clean and documented source code (I wouldn't put everything in one 21k line file though).

What I think went wrong is that it's got too many very specific features and it tries to handle everything. Probably the author(s) have been spending too much time in their silo perfecting and adding features to their software.

I think they should have been spending more time on Stack Overflow answering questions about reposurgeon, writing introduction to the tool and blogging about it. I understand this tool is pretty niche and is designed for some very special use cases, but that should not mean "experts only". A casual user should not be intimidated by the advanced features right away. It should be possble to do a trivial conversion with some minimal command set.

Look at Git. I know it's a train wreck of a user experience. It's got commands for everything. Mastering Git could take a lifetime. But if you stick to pull-add-commit-push workflow, it's not that bad. Powerful, shouldn't mean unapproachable.


Editor guide
610yesnolovely profile image
Harvey Thompson

Yes, I also had similar issues with trying to learn reposurgeon.

In the end it was "easier" (perhaps not faster, but less headaches) to use git commands and lots of google/StackOverflow searching.

What I did take from reading up on reposurgeon is that it's a good idea to iterate on a solution:

  • Set of scripts checked into a seperate git repository.
  • The scripts start with the source unmerged git repositories and nothing else.
  • The scripts never modify the source umerged git repositories.
  • The scripts product is a merged git repository.

I ended up iterating on a Makefile which contained various git commands to take the source repositories (I think there were 13?).

At each step I added more git commands to filter commits, add branches and tags, add commit messages to map original commits to new commits, etc.

Once I was fairly happy with the result, I started trying to use the merged repo, found a few mistakes and had to rebuild it from scratch again, and keep iterating.

Took under a week to do, and theoretically I could run that script again at any time (with possible fixes) to produce an alternative branch, if I ever find issues.

It's indeed a shame that reposurgeon is super-niche. I suspect that's why the author charges consultancy fees (which is fair enough).

detunized profile image
Dmitry Yakimenko Author

Yeah, I also iterate on scripts and use git to keep track of progress. I don't do mkdir ... anymore for new projects, I just say git init ....

qm3ster profile image
Mihail Malo

I needed to merge 5 git repos into a monorepo this weekend, and I felt like being fancy and keeping history (wasn't strictly necessary in this case), so I just did this by hand: saintgimp.org/2013/01/22/merging-t...

It wasn't too bad.

netmailgopi profile image
Gopi Ravi

The link is broken

qm3ster profile image
Mihail Malo

Thank you!
Not anymore :D

jessekphillips profile image
Jesse Phillips

From the documentation it sounds like a tool for someone who manages database translations and history pruning as their career.

detunized profile image
Dmitry Yakimenko Author

Clearly not me =)