DEV Community

Git Submodules Revisited

Dave Cridland on May 12, 2018

Git's submodules are so universally derided that there's practically an entire industry devoted to providing alternatives for managing dependencies...
Collapse
 
610yesnolovely profile image
Harvey Thompson

I have been working on a large multi-year project and had it split into seven or more repositories for each component. Since submodules "sucked" (or did back then), I invented my own dependency management tool.

After about two years I have recently converted it all back into a mono-repository because it's so much less work to maintain:

  1. Just use normal git commands.
  2. If you update a dependency, check it in as-is.
  3. The state of the repository is the state of your dependencies, just git clone to start.
  4. You can use normal git tools or GUI's (such as Source Tree) and merge/diff tools.
  5. Tools and views on GitHub or similar just work.
  6. Reduces engineer drag (you spend less time fiddling with dependencies).

The conversion back to a mono-repository required me to write a fairly complex bash script, mostly using git commands (great way to get more experience in git). I wanted to merge the history for all the repositories into one linear history, including tags (but fortunately only one branch).

I made the script repeatable (it leaves the original repositories alone) so I could check that script into git itself and develop it slowly until the results were correct.

Here's some of it:

    echo "Cherry picking $merge_branch into $splice_branch ..."
    git rev-list --no-merges --reverse $start_splice_branch..$splice_branch | \
    git cherry-pick --stdin -x --allow-empty --strategy=recursive -X theirs

See Stackoverflow - Splice unrelated repositories

Having said that, submodules or something similar may be useful for external dependencies, though having said that every company I have worked for just check in the dependencies into a mono-repository.

Collapse
 
dallgoot profile image
dallgoot

What I've seen most about git submodules is that people expect them to have a behavior that doesn't match Git one.
A submodule is a git repo in a git repo, plain, simple.
You want a specific version of this dependency: just "git checkout".
You want a version by tag, create the tag, and "git checkout".
You made fixes in the main project AND in the dependency:

  • dependency repo: git commit, git push
  • main repo: git commit, git push

...exactly what you "should" do if the dependency repo was separated...

I don't get what people find so hard to work with...

Collapse
 
dwd profile image
Dave Cridland

For us, the problem wasn't when committing individual dependencies, it was when trying to track along branches sanely.

But in more general terms, while I know that git submodule works more or less like there's just a totally independent checkout underneath, that is the problem most people have - the feeling that it's a bolted-on feature rather than properly integrated.

Collapse
 
dallgoot profile image
dallgoot

Can you elaborate on "properly integrated" I fail to see the need that Git submodule doesn't provide.
..and I'd like to know ;)

Collapse
 
ilammy profile image
ilammy

It may be okay for third-party dependencies. But when you add something your team actively develops (like a library shared between projects) then submodules may become a hassle.

Switching branches git branches in the main repo does not update the submodules. When you run git status in the main repo it does not show what changed in the submodules. In general, all git commands are now current-directory-aware. It becomes a hassle. And this does not trip into code review, where with web UIs like GitHub/GitLab/Bitbucket if you want to make a change to the submodule and to the main repo you have to create several separate pull requests.

And now, drop transitive dependencies into the picture.

Thread Thread
 
nhatkhai profile image
NhatKhai • Edited

I think you mixing up with submodule and pull request. Pull request is a "big" issue for productivity. Not the submodule. So what you describe is the result of ineffective pull request style.

Using submodule as any tool have side effect. It's hard side effect (to me) would be the selection of a wrong branches where it can be rebase freely. This would make a commit disappeared.

Collapse
 
dextermb profile image
Dexter Marks-Barber

We've recently picked up git submodule for our modulized project. The whole branching/versioning of submodules was the reason why I pressed forwards with using them.

Interesting how you didn't notice the branching in the overview of the documentation haha.

Collapse
 
dwd profile image
Dave Cridland

No, I didn't find that section - I found the sections on --branch and --remote, and man 7 gitsubmodules has no mention of branch at all as far as I can see.

If there's something more, I'd love a pointer!

Collapse
 
dextermb profile image
Dexter Marks-Barber

On the opening page it briefly shows the branch config option:
git-scm.com/docs/git-submodule#git...

[submodule "<path>"]
    path = <path>
    url = <repo>
    branch = <branch> // <-- magical
Enter fullscreen mode Exit fullscreen mode

We use branches as versions, so each minor change to a submodule increments the branch property. We then run:

git submodule sync
git pull --recurse-submodule
Enter fullscreen mode Exit fullscreen mode

...to make sure that everything is updated. You're right in saying that it doesn't seem too well documented though.

Collapse
 
netchkin profile image
Pavel Janečka • Edited

What's keeping me from using submodules mainly is not being able to reference commits by tags. It may be viewed as sheer laziness, but tags denote versions imho, and using branches just creates a lot of space for unintended errors, and don't get me started on commit hashes. :)

EDIT: sorry, I kept the main thing in my head only. So, the thing is, I want to keep it explicit which versions of deps my code relies on. That's why I consider tags a would-be ideal option.

Collapse
 
dwd profile image
Dave Cridland

When you checkout a project's commit, you also checkout the dependency commits at that point - the commit of the submodule is part of the version history of the parent.

But a git tag of the parent doesn't create a tag in the child repositories - that's probably a good thing, though.

Collapse
 
netchkin profile image
Pavel Janečka

That's something different than I had in mind. The thing is, when I want to manage or view versions of submodules, I'd like to use their respective version tags.

Collapse
 
ben profile image
Ben Halpern

Thanks for this. When I was looking into git modules I found very few resources truly going deeper than "meh, I wouldn't use them".

Collapse
 
dwd profile image
Dave Cridland

Yeah, that was pretty much the conversation at Surevine. I was actually thinking of digging into the Git source code and making it do what we needed, so finding this has saved the world from my code.

Collapse
 
aaronktberry profile image
Aaron Berry

Thanks for your post, really great breakdown of how you would use submodules in a project structure that never seems to get covered. I really think in most cases the git submodules are a nice simple way to manage project dependencies without adding extra tooling and complexity.