DEV Community

Cover image for GitLab on premise to GitHub cloud Migration
Bhargavi Chiluka
Bhargavi Chiluka

Posted on • Edited on

GitLab on premise to GitHub cloud Migration

Introduction

With latest public notice from GitLab, the on premise server instance security and maintainance would be stopped from 2024.

Once this is effective in place, we would left with our on premise Code hosting servers with no updates and on demand maintenance would be a special case to consider.

Goal

Understanding this situation, lets see how we can move our on premise Code hosting to our organization enterprise platform of GitHub.

The migration process is planned to achieve

  1. Zero lose to commit history
  2. No Lose of access to teams/groups
  3. No hault or impact to our deploying strategies at present.

The migration process involving

  • a 160+repositories and
  • 85+ Deployment pipelines
  • 3 Jenkins server instances Is actually a critical and complex migration with no room for error tolerance.

Strategy

Considering the all use cases, I drafted the migration strategy as follows…

Since we are supposed to migrate from ON PREMISE to CLOUD, We cannot utilise any migration tools provided by GitLab nor GitHub…

Hence, it's a tough manual process entirely, with a lot of repeating steps, evolving to be following steps.

  1. Clone the repo to intermediate device that could access private ok premise instance and cloud on internet.
  2. Detach on premise server configuration with cloned repo
  3. Create a new repo under organization GitHub Cloud account with necessary naming conventions and role based access.
  4. Establish remote repo configuration with cloned repo to new repo.
  5. Synchronise the code base with all branches and commit history. (A tough job)
  6. For the deployment process not to fail, create necessary public private key pair and store in repo level deployment key store.
  7. Generate unique SSH config and clone URL per repo per key to avoid key collision issues in ci/cd instances (Jenkins)

1. Cloning the repo

As explained above, the clone device must be a interoperable between two code hosting platforms to have a seamless migration.

The process is cloning is achieved with

git clone <ssh-based git clone URL>
Enter fullscreen mode Exit fullscreen mode

2. Detach the server configuration

This is as simple as deleting a parameter from configuration after full clone

git remote remove <remote-name>
Enter fullscreen mode Exit fullscreen mode

3. Create new repo in cloud with RBAC

Pertaining to organization standard practice, we should either create private or internal repo. Considering requirements of access, I opted for private repo creation.

But with huge count of repositories, it makes tough to create each repo with UI and also there is not any support from git CLI.

Hence, I adopted GitHub Cli Portable and logged in with github creds to achieve the task via terminal scripts. As follows

gh repo create --private @<org>/<repo-name> --source=<cloned-repo-path> --team=@<org>/team-handle
Enter fullscreen mode Exit fullscreen mode

4. Establish the connection with local repo to new repo

The above one with —source actually created a problem of sync due to only main branch consideration. But our target of pushing all branches with all commits isn't solved.

Hence manual remote addition and push to new repo are required and incorporated with

cd <cloned-repo-path>
git remote add origin <new-repo-ssh-url>
Enter fullscreen mode Exit fullscreen mode

5. Syncing all branches and commits

As explained above, clone always picks up the main/default branch and checked out.

Though it cloned all the branches locally, it cannot directly push them to new repo unless they are checked out.

Reason: When something is cloned, git downloads them as objects and stores in respective stream/tree. And won't copy locally unless you check-out. Hence direct push from local can only push the default/main branch.

With above, it's clear that we should check-out all branches to push them to new repo. But how do we know how many branches are there? Listing all branches and looping then over is a brute-force yet a bad idea.

So we force copied remotes objects inferring the local checkout as follows.

git push --mirror
Enter fullscreen mode Exit fullscreen mode

6. Deploy Keys Setup

Since we have 160+ repos being used over 85+ pipelines and the servers are on premise earlier, we had SSH of one of user account added in Jenkins instances to succeed in code pull.

But with GitHub Cloud involvement, it made it impossible to use same strategy, taking access management and access life cycle policy of the org.

Which evolved to be,

  1. Cannot use SSH key addition to existing developer accounts
  2. Bot User Account SSH setup
  3. Personal access token setup to either of above
  4. Fine grained access token setup of either of above.

That left us with last option of repo level, only read allowed deploy keys setup.

As mentioned, these keys are

  • Repo level
  • Read only Access controlled
  • Unique to usage. (One device-one key)

Hence the deploy key addition is handled via portable Github Cli as follows

Create a password less key first

ssh-keygen -t rsa -b 4096 -N "" -f <path>/<repo-name-as-certificate-name>

Enter fullscreen mode Exit fullscreen mode

The above would generate a private public key pair in any mentioned path.

Add the pub key to repo via cli

gh repo --repo <repo_name> deploy-key add <path-to-certificate>/<repo-nane-as-certificate>.pub

Enter fullscreen mode Exit fullscreen mode

7. Generate unique SSH Config to avoid key collision.

With unique deploy key reach repo, git clone command cannot automatically pickup the right key for repo even if we dump all the keys in relative .ssh directory. And would lead to time out issues in cloning.

To solve this problem, we must pass key to use while cloning to git clone command.

This might be feasible for limited number of use cases but not to automation stages like in Jenkins.

To achieve the scalability, we can actually write SSH configuration which can use different key to different host(domain).

But with each repo in same GitHub, how do write different host names for different repos? Again this can be solved with host name resolution in ssh config.

As a whole, we can write config to tell ssh to use key1 to test1.com and later ask ssh to resolve test1.com to github.com with

Host test1.com
    Host github.com
    IdentityFile <path-to-key-file>
    # If ci/cd is behind a firewall, add proxy
    ProxyCommand nc --proxy-type sock5 --proxy <proxy-addres> %h %p
Enter fullscreen mode Exit fullscreen mode

And git clone command to pickup above config will be like,

git clone git@test1.com:<org>/<new-repo-name>
Enter fullscreen mode Exit fullscreen mode

Conclusion

The whole strategy above explained, would look like the following for one single repo, and looped over the repo titles to be migrated.

#step 1
git clone <on premise server URL>/<repo-nane>
#step 2
cd <repo-name>
git remote remove origin
cd ..
#step 3
gh repo create --private @<org>/<new-repo-name> --source=<cloned-repo-path> --team=@<org>/team-handle
#steo 4
cd <repo-name>
git add remote origin @<org>/<new-repo-name>
#step 5
git push --mirror
#step 6
ssh-keygen -t rsa -b 4096 -N "" -f <path>/<new-repo-name>
cd <repo-name>
# if we are in repo dir, gh will auto pickup correct repo from origin config
gh repo deploy-key add <path-to-certificate>/<new-repo-name>.pub

#step 7
# this config needs to stored in relative ssh config of device/user
Host github-<new-repo-name>.com
    Host github.com
    IdentityFile <path-to-key-file>/<new-repo-name-as-private-key-cert>
    # If ci/cd is behind a firewall, add proxy
    ProxyCommand socks5 ....

# updated git clone command with new URL tobe used in pipeline is
git clone git@github-<new-repo-name>.com:<org>/<new-repo-name>
Enter fullscreen mode Exit fullscreen mode

Top comments (0)