loading...

Git Submodules Revisited

dwd profile image Dave Cridland ・3 min read

Git's submodules are so universally derided that there's practically an entire industry devoted to providing alternatives for managing dependencies.

But like anything in git, it's often worth giving the man-pages a good going-over and figuring out whether there's some options that do what you want, or to see if they've improved lately.

What I want

So, Metre is my exemplar project. It's got a slew of submodules, in part because some of our customers run (really) ancient versions of Linux and so we're going to need to statically link. Yay, fun!

But that means managing and shipping our own build of OpenSSL, for example - and that's a terrifying prospect for our Security Guy (lovely chap called Simon). It's pretty terrifying for me, too, actually.

In practical terms, then, our release cycle involves advancing along a stable branch on all the submodules, such that we're confident that we've picked up any bugfixes. This needs to be as simple as possible - really, a single command we can run as we need to.

But, we want to have high confidence that checking out a particular commit hash of Metre will give us the same dependencies we built with.

Git Submodule Add

Initially, I went for git submodule and a lot of manual work. I (lead dev) wasn't happy with this. Simon The Security Guy wasn't happy with this. Pete, one of our senior devs, conducted a full review of the project and highlighted it too.

The problem is that one slip and a dependency could be left with a serious security issue in. And Metre is meant to be all about security.

The plus-side of git submodule is that it tracks the commit hashes of submodules, and you can check them all out at the right hash with either a git clone --recursive or a git submodule update --init --recursive.

We considered switching to something else, but then we'd lose much of the built-in smarts of git submodule, and that's also a pain.

Oh, look - branches!

A deep dive into man git-submodule and man 7 gitsubmodules, however, found me gold.

First, there's a -b branch switch to git submodule add. That adds the submodule at a specific branch, and moreover sets the "tracking branch" - the one git normally pulls from - to the remote origin branch just as you'd normally do.

Second, I found a config option of submodule.{submodule name}.branch, which stores this. This isn't quite as great as you'd think, though, because while you can set this in git config for the repository, it's not tracked.

Fear My Editor Skillz

However, submodule configuration is stored in the repository in the .gitmodules file at the top. So you can edit that file, find the section, and simply add a branch key right there:

[submodule "deps/spiffing"]
        path = deps/spiffing
        url = http://github.com/surevine/spiffing
[submodule "deps/openssl"]
        path = deps/openssl
        url = git://git.openssl.org/openssl.git
        branch = OpenSSL_1_1_0-stable

The default is master, though, so if that's all you wanted, you've got that already.

Updating Branches

The normal command for updating submodules is git submodule update. There's three flags of interest:

--init performs a git submodule init if the submodule isn't already cloned into place, and nothing otherwise - so it's always safe to use.

--recursive recurses through each submodule, running the same git submodule update command in each.

--remote is the magic - that performs a git pull along the remote tracking branch. It's this that we want.

Workflow Summary

So now the workflow looks like this:

git clone --recursive git@github.com:surevine/metre - clones the repository and checks out the HEAD of master.

git checkout foo - checks out the foo branch or commit - and will switch the submodules to the commits they were for foo.

git submodule update --init --recursive --remote - updates all submodules recursively along their tracking branches. Without the --remote, it'll reset the submodule working directories to the "right" commit for the parent.

Finally, you can:

git config submodule.recurse true - tells git that most commands should act recursively, in particular git pull.

Posted on May 12 '18 by:

dwd profile

Dave Cridland

@dwd

Open source and open standards for mission-critical code.

Discussion

markdown guide
 

I have been working on a large multi-year project and had it split into seven or more repositories for each component. Since submodules "sucked" (or did back then), I invented my own dependency management tool.

After about two years I have recently converted it all back into a mono-repository because it's so much less work to maintain:

  1. Just use normal git commands.
  2. If you update a dependency, check it in as-is.
  3. The state of the repository is the state of your dependencies, just git clone to start.
  4. You can use normal git tools or GUI's (such as Source Tree) and merge/diff tools.
  5. Tools and views on GitHub or similar just work.
  6. Reduces engineer drag (you spend less time fiddling with dependencies).

The conversion back to a mono-repository required me to write a fairly complex bash script, mostly using git commands (great way to get more experience in git). I wanted to merge the history for all the repositories into one linear history, including tags (but fortunately only one branch).

I made the script repeatable (it leaves the original repositories alone) so I could check that script into git itself and develop it slowly until the results were correct.

Here's some of it:

    echo "Cherry picking $merge_branch into $splice_branch ..."
    git rev-list --no-merges --reverse $start_splice_branch..$splice_branch | \
    git cherry-pick --stdin -x --allow-empty --strategy=recursive -X theirs

See Stackoverflow - Splice unrelated repositories

Having said that, submodules or something similar may be useful for external dependencies, though having said that every company I have worked for just check in the dependencies into a mono-repository.

 

What I've seen most about git submodules is that people expect them to have a behavior that doesn't match Git one.
A submodule is a git repo in a git repo, plain, simple.
You want a specific version of this dependency: just "git checkout".
You want a version by tag, create the tag, and "git checkout".
You made fixes in the main project AND in the dependency:

  • dependency repo: git commit, git push
  • main repo: git commit, git push

...exactly what you "should" do if the dependency repo was separated...

I don't get what people find so hard to work with...

 

For us, the problem wasn't when committing individual dependencies, it was when trying to track along branches sanely.

But in more general terms, while I know that git submodule works more or less like there's just a totally independent checkout underneath, that is the problem most people have - the feeling that it's a bolted-on feature rather than properly integrated.

 

Can you elaborate on "properly integrated" I fail to see the need that Git submodule doesn't provide.
..and I'd like to know ;)

 

We've recently picked up git submodule for our modulized project. The whole branching/versioning of submodules was the reason why I pressed forwards with using them.

Interesting how you didn't notice the branching in the overview of the documentation haha.

 

No, I didn't find that section - I found the sections on --branch and --remote, and man 7 gitsubmodules has no mention of branch at all as far as I can see.

If there's something more, I'd love a pointer!

 

On the opening page it briefly shows the branch config option:
git-scm.com/docs/git-submodule#git...

[submodule "<path>"]
    path = <path>
    url = <repo>
    branch = <branch> // <-- magical

We use branches as versions, so each minor change to a submodule increments the branch property. We then run:

git submodule sync
git pull --recurse-submodule

...to make sure that everything is updated. You're right in saying that it doesn't seem too well documented though.

 

Thanks for this. When I was looking into git modules I found very few resources truly going deeper than "meh, I wouldn't use them".

 

I don't get why more people don't use submodules. The source repo updates, you update your submodule and it gets used in your code. You don't even HAVE to update submodules, but you still get the benefit of having the exact commit hash of the version you're using, whether it's development or stable. Why would you NOT want to do that?

 

It may be okay for third-party dependencies. But when you add something your team actively develops (like a library shared between projects) then submodules may become a hassle.

Switching branches git branches in the main repo does not update the submodules. When you run git status in the main repo it does not show what changed in the submodules. In general, all git commands are now current-directory-aware. It becomes a hassle. And this does not trip into code review, where with web UIs like GitHub/GitLab/Bitbucket if you want to make a change to the submodule and to the main repo you have to create several separate pull requests.

And now, drop transitive dependencies into the picture.

I think you mixing up with submodule and pull request. Pull request is a "big" issue for productivity. Not the submodule. So what you describe is the result of ineffective pull request style.

Using submodule as any tool have side effect. It's hard side effect (to me) would be the selection of a wrong branches where it can be rebase freely. This would make a commit disappeared.

 

Yeah, that was pretty much the conversation at Surevine. I was actually thinking of digging into the Git source code and making it do what we needed, so finding this has saved the world from my code.

 

What's keeping me from using submodules mainly is not being able to reference commits by tags. It may be viewed as sheer laziness, but tags denote versions imho, and using branches just creates a lot of space for unintended errors, and don't get me started on commit hashes. :)

EDIT: sorry, I kept the main thing in my head only. So, the thing is, I want to keep it explicit which versions of deps my code relies on. That's why I consider tags a would-be ideal option.

 

When you checkout a project's commit, you also checkout the dependency commits at that point - the commit of the submodule is part of the version history of the parent.

But a git tag of the parent doesn't create a tag in the child repositories - that's probably a good thing, though.

 

That's something different than I had in mind. The thing is, when I want to manage or view versions of submodules, I'd like to use their respective version tags.