DEV Community

Cover image for How to save old repositories in one including git history
Artem Pakhomov
Artem Pakhomov

Posted on

How to save old repositories in one including git history

Original on Russian: https://habr.com/ru/post/570404/

Sometime a couple of years ago at work, I was approached by a colleague (hello!) Who knows my love for automation with a rather non-trivial request. It was necessary to clean up the old repositories in the corporate Github org, but not completely delete, but save it just in case. And not just save, but save with git history. He and I quickly sketched a script on the bash, which took as an argument orgName/projectName. After the script finished, he pushed the code into a separate branch, and then it could be merged into the main "repository branch". The script was written quickly, solved the problem, but still there were a couple of actions that had to be done "by hand". But in that case it was fine, as it required confirmation to archive the old project. Back then, I had the idea to make a Github Actions workflow, which I just got to know. But there was no free time for this. And now, after a year and a half, I finally did my plan - this is how the Bygone project was born, which is available on GitHub.


In 2017-2018, I studied at the branch of the French school 42 and had to write a lot in C. There were many educational projects, ranging from reverse engineering of the C standard library, printf, sha / md5 / des algorithms, to their own projects that were not related to School 42. There was a simple DNS proxy, network packet sniffer, reverse engineering of the Minesweeper game ... Now I understand how terrible that code is, but I want to save it as a memory of sleepless nights over segfaults, bass errors, lost pointers, memory leaks. Two dozen repositories of memories.

My GitHub wanted a cleanup for a long time, and I kept putting off the idea of ​​making a fully automatic collector until more free days. And now I have a cough for the fourth day and a temperature of 37.5 on the second day - I can't sleep - it's time to finish it. I created a repository for the project a few months ago, but my hands never got around to doing it.

Collecting requirements

  1. The script should put each branch of the project in a separate folder
  2. The script must keep the commit history
  3. The script must be resistant to merge conflicts like (rename / rename)
  4. The script must work through Gihub Actions and run manually
  5. The script should accept as environment variables the archive repository, the archived repository, the main archive branch into which to merge everything, as well as the github token and username to push the new one
  6. The script should work without intervention through the console (for example, a new project can be added to the archive from the phone's browser)
  7. The script should be as simple as possible in order to encourage people to participate in its completion (<3 open source)

Implementation

According to the list of requirements, I decided to write the script itself in bash, although I am not good at it. The logic is quite simple: we get a list of branches, copy the contents of the project and transfer everything to a subfolder, commit, repeat for all branches, then add a new remote repository, and merge all branches into the archive. But there is one nuance, if we transfer everything to a subfolder, and then do the same with the neighboring branch - we can get a merge conflict of the type (rename / rename). Therefore, I had to add a commit in one place, which will not work if there is no merge conflict (the git will ignore the command, since there is nothing to commit), but if there was a merge conflict (rename / rename), just git add && git commit will solve the problem.

Config code for Gihub Actions:

name: archivist
on:
  workflow_dispatch:
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Archived Repository
        uses: actions/checkout@v2
        with:
          repository: ${{ secrets.ARCHIVED_REPOSITORY }}
          token: ${{ secrets.WRITE_GITHUB_TOKEN }}
          path: archived
          fetch-depth: 0
      - run: |
          cd ./archived
          git config user.name github-actions
          git config user.email github-actions@github.com
          existBranches=($(git branch -a | grep remotes | awk -F '/' '{ print $3  }'));
          for i in "${existBranches[@]}"; do
               git checkout -- .
               git checkout -b ${i}-move origin/${i} || true
               git reset --hard HEAD
               mkdir -p ../tmp/${{ secrets.ARCHIVED_REPOSITORY }}
               mv ./.git ../tmp/.git
               cd ..
               mv ./archived/ tmp/${{ secrets.ARCHIVED_REPOSITORY }}/${i}/
               cd ./tmp/
               git add .
               git commit -m "Preparing ${{ secrets.ARCHIVED_REPOSITORY }}/${i} for move"
               cd ../
               mv ./tmp/ ./archived/
               cd ./archived
               done;
          git config -l | grep 'http\..*\.extraheader' | cut -d= -f1 | xargs -L1 git config --unset-all
          git remote add archive "https://${{ secrets.GIT_USERNAME }}:${{ secrets.WRITE_GITHUB_TOKEN }}@github.com/${{ secrets.ARCHIVE_REPOSITORY }}"
          git fetch archive
          git checkout -b tmp_${{ secrets.ARCHIVE_REPOSITORY_DEFAULT_BRANCH }} archive/${{ secrets.ARCHIVE_REPOSITORY_DEFAULT_BRANCH }} || true
          for i in "${existBranches[@]}"; do
               git merge ${i}-move --allow-unrelated-histories --no-edit --no-ff > /dev/null 2>&1 || true
               git add .
               git commit -m "resolve conflicts for ${{ secrets.ARCHIVE_REPOSITORY }}/$i" || true
               git push --force --quiet "https://${{ secrets.GIT_USERNAME }}:${{ secrets.WRITE_GITHUB_TOKEN }}@github.com/${{ secrets.ARCHIVE_REPOSITORY }}" tmp_${{ secrets.ARCHIVE_REPOSITORY_DEFAULT_BRANCH }}:${{ secrets.ARCHIVE_REPOSITORY_DEFAULT_BRANCH }}
          done;
          >&2 echo "All done" ;
Enter fullscreen mode Exit fullscreen mode

Total

I wrote the script in a couple of hours, and then spent another 40 minutes to merge more than 20 repositories there (and finally delete them). Of course, I would like to add the list right away, but the current solution worked for me. You can add a new project to the archive at any time.

Begone on GitHub

Example arhive

Top comments (0)