loading...
IT Minds

How to split and merge multiple git repositories while keeping the history

benjaminoerskov profile image Benjamin Ørskov with Rasmus Witt JensenUpdated on ・4 min read

Whenever we create new projects and repositories we do our best to get naming and grouping to match the world we are in. Even if we get everything right to begin with, the world around us changes. As time goes on new features get implemented, and we might end up with a repository that contains more than it should or it's missing something that was put in another repository, etc. As consultants we have seen this many times, and it makes it difficult for new developers to find the code they need to work on, and understand what is going on. The team is often aware of the issue, but at the same time afraid of tackling the issue. The same concerns come up, how are we going to keep our commit history? We don’t have any git wizard if it goes wrong. It is going to take too long. The list goes on.
Luckily there is a solution to the issue. Yes, it is with git, but bear with us.

Given two or more repositories containing several projects, we want to create a single repository containing two or more projects from each of the initial two repositories. We want to maintain our git history for each of the projects we merge. As a result, this means that we will have a repository containing the complete history of each file in the new repository and no other history.

It can be done with as many repositories and projects as you want but in the following example we will be using two projects from two repositories which equals 4 projects in total.

Git has a function called filter-branch that enables you to do this. Filter-branch however has a lot of pitfalls and is extremely slow. Git actually recommends that you use a tool called filter-repo instead.

But enough talk, let’s get started!

First some prerequisites

  1. Python: https://www.python.org/downloads/
  2. Scoop package manager: https://scoop.sh/
  3. Git: https://git-scm.com/downloads
  4. Git filter-repo: https://github.com/newren/git-filter-repo/

We will start off by installing git filter-repo, which we will use to split our initial repositories. This can be a bit tricky, so we will guide you through it:

  1. Use scoop to install git-filter-repo in an elevated powershell: scoop install git-filter-repo.
  2. Run git filter-repo to test if it works.
    1. If you get an error or nothing happens, it might be because git-filter-repo points to the wrong version of python.
    2. Open C:\Users\[USER]\scoop\apps\git-filter-repo\[version]\git-filter-repo.ps1 and change python3 to python in line 1, depending on which version of python you have installed.
    3. Run git-filter-repo once more to test if it works.

Now, the overall approach we will be using is this:

  • Step A:
    For the two repositories we want to filter away all files that do not relate to the projects we want to merge.

  • Step B:
    Merge the two repositories that only contain the projects we want, into a new repository.

Step A is to remove unrelated projects from each initial repository:

  1. Clone the repository.
  2. Run the following command with powershell in the repository root folder: git filter-repo --path [PROJECT1]/ --path [PROJECT2]/ --prune-empty always --tag-rename '':'[OLDREPONAME]-'.
    1. --path can be repeated for each project we want in our new repository.
    2. --prune-empty always means we always want to prune empty commits. Thus removing all commits not associated with the projects we want.
    3. --tag-rename means we rename all tags for this repository. This way we can differentiate between the tags for the old repository and the new repositories.
  3. In the folder you should now see a folder: .git(hidden), and one folder for each of the projects you want to keep. In this example it's three folders: .git, PROJECT1 and PROJECT2.
  4. Repeat this for both repositories.

We should now have two repositories with two projects in each.

Step B is to merge these repositories into our new repository:

  1. Create a new folder: newRepo.
  2. Run git init in this folder.
  3. Add each repository from Step A as a remote:
    1. git remote add [OLDREPOSITORY] [PATHTOOLDREPOSITORY]\.git.
    2. Example: git remote add OLDREPOSITORY1 C:\Users\[USER]\Desktop\repositories\OLDREPOSITORY\.git.
  4. Now we can merge our repositories into the final result by running git fetch -all.
  5. For each remote added run: git merge [REMOTE]/[BRANCH] --allow-unrelated-histories.
  6. Remove each added remote by running git remote rm [REMOTE].
  7. Add a new remote in your favorite git solution.
  8. Run git remote add origin [REPOURL] to connect your local repository to your new remote repository.
  9. Run git push --set-upstream origin master to set the branch to the master branch of the remote repository and push the code.
    1. If your hosted repository manager makes an initial commit to the repository you are trying to push to, you need to add --force, to your push command.
  10. Depending on your old setup, you might want to add a .gitignore file to the new repository.

Congrats, you did it! Now go to your new repository and verify that all the wanted files are there as well as a git history that does not show nonsense, unrelated files or duplicate commits.

Discussion

pic
Editor guide