Git Explained (5 Part Series)
Oftentimes, a team of developers will be faced with the decision of implementing a Git structure to handle the different parts that a project entails. Some teams opt for a monorepo, while others decide to break it down into sub-projects, for instance, a repository for the UI and another one for the API. Both Git structures have advantages and drawbacks.
Monorepos offer benefits such as code sharing, less code duplication, a single build, and atomic commits. However, monorepos also deal with a large number of commits in unrelated parts of the tree, as well as potential performance issues when cloning, fetching, and pushing code. On the other hand, maintaining sub-projects also translates into multiple repositories to clone, multiple sets of instructions to follow in order to install and build, and multiple directories to keep up with.
Recently, I’ve been introduced to the use of Git Submodules to create a central umbrella repository that points to a particular commit in both the UI and API external repositories. This umbrella structure allows you to keep a sub-project composition with some of the advantages of using a monorepo such as managing documentation and configurations in one central place, and the ability to pull the whole project, including sub-projects, in one step. This post is going to discuss the benefits of using an umbrella repository structure using Git Submodules, the struggles that come with it, and the commands that you need to learn to start using this Git structure.
Sub-project composition. Having multiple repositories for the different parts of your project allows you to give limited access on a “need to code” basis. For instance, you might be in a situation where a frontend developer is added to your team for a few weeks to speed up the UI implementation. This developer can easily choose to pull only your UI repository, without having to mess with the API.
Continuous Development. The proposed umbrella structure using Git Submodules keeps a sub-project composition that allows each repository to have it’s own deployment process.
Documentation in one place. The umbrella repository gives you space to store documentation that applies to the whole project in a single location. I personally like to keep a global markdown file with the instructions to build and run the whole project using Docker in the umbrella repository and a markdown file in each repository with instructions on how to install and use each sub-project locally.
Configurations in one place. Our team uses Docker containers to create, deploy and run applications. The umbrella repository is a fantastic spot to place a
docker-compose.yml file that builds and runs a database image, your api, and ui environments with one command.
Atomic code pulling. By using the
--recurse-submodules flag when cloning an umbrella repository, you get all the nested directories that contain submodules in one command. In other words, you get your UI and API repositories all at once.
Need to specify the use of Git Submodules in instructions. If your team is not used to this Git structure, they most likely won’t know how to clone or handle them correctly. Git does not download submodule contents by default, so make sure to include this information in your README.
Manual retrieval of submodules updates. Git won’t automatically register the updates to submodules. You need to run
git submodule update --remote to see changes made to a submodule.
Simple implementations only. So far, I’ve only used this umbrella structure to link the UI and API repositories per project. I can see how keeping up with more than two submodules could get challenging and might require you to look into a different implementation.
Mostly useful when using Docker. If you are not using Docker, the benefits of this structure might not be as impactful.
Let’s discuss the possibilities and commands you have in your toolbox to implement this umbrella structure:
- Create a project with submodules
- Rename a submodule
- Clone a project with submodules
- Work in a submodule
- Check and pull submodule updates
- Update umbrella project with latest submodule commits
To explain this action clearly, I will create a repository called umbrella and two repositories called umbrella-ui and umbrella-api in GitHub. You can check them out here.
cd[umbrella-project-name] git submodule add [firstname.lastname@example.org:your-account/umbrella-ui.git] [name]
Note: the name parameter is optional. If you don’t pass a name, you will get the submodule repository name. In this case, I will give it the “ui” name parameter because I don’t want a redundant directory name like “umbrella-ui” inside my “umbrella” project.
ls -a command to see a new directory called
ui and a new file called
.gitmodules file is a configuration that stores the mapping between your main project and its submodules. It will look similar to the following:
[submodule “ui”] path = ui url = email@example.com:your-account/umbrella-ui.git
After committing and pushing this change, you should see a folder for your submodule that points to the latest commit in the submodule's master branch.
If you forget to pass a name parameter when adding your submodule, you can still change your submodule directory name using the following command:
git mv [old-name] [new-name]
Pass the following command to automatically clone your project and its submodules. If you don’t pass the
--recurse-submodules flag, your directory will be empty.
git --recurse-submodules [git-repo-url]
This is the same process that you follow when dealing with a regular repository. You make changes, commit them, push them to remote.
When changes in submodules are pushed to their remote repository, you are going to need to check for these changes and update your umbrella repository because the commit pointers will be out of date.
To make sure you understand this concept, I will show you the
umbrella-api repository, which has been updated and the last commit in master is
Now let’s look at our umbrella repository, the
api submodule is not pointing to the latest commit in
umbrella-api. Instead, it is pointing at an old commit
You can pull the changes made in all your submodules directly from the umbrella directory with the following command:
// fetch all new updates in submodules git submodule update --remote // see which submodule has been modified git status
// see all new commits that have been pulled down git diff
Now that we pulled changes made in our submodules, we need to update our umbrella project pointers. This is VERY IMPORTANT because if someone clones our umbrella project in this state, they won’t get the newest code under our submodules.
// add all submodules changes git add . // commit new pointers git commit -m “Update submodule pointers” // publish the changes git push origin master
Note: You can also use the following shortcut to pull the latest changes made to your submodules and get your umbrella project to point to the last commits.
git submodule update --remote --merge
Now our umbrella project points to the latest commits in both submodules, umbrella-ui and umbrella api.
Have you used Git Submodules in a similar umbrella structure before? If not, I hope it sparks your interest and you give it a try.
Thank you for reading, liking, and sharing my Git Explained series posts, it truly means a lot to me. I enjoy putting them together and love your questions and suggestions.
Checkout my other Git Explained posts and keep an eye for the next one. I explore, define, and explain Git concepts every Saturday!