Kenneth Gunnerud

Posted on Jul 31, 2020 • Edited on Nov 1, 2021

Monorepo with Java, Maven and GitHub Actions, including basic example

#java #maven #github #monorepo

Edit 18.06.2021: Updated with a few notes at the bottom

Edit 30.07.2021: Updated GitHub example with gitlab example and renamed common-util to base-framework as we standardized on this on my current probject (and like that better than common-utils -.-). More notes at bottom

Example code can be found here.

Monorepos has come up from time to time in discussions, especially since many of the big companies use this kind of technique to structure their code. The last project I was at, is by far not that big and we also decided to not take it as extreme as many others have done. This will be a post about how we did it and how it worked for us / what pitfalls we experienced.

First of, we changed to monorepo while the organization was moving code slowly over to GitHub, at the same time, GitHub announced GitHub Actions and GitHub Package Registry. Before this, we had a typical BitBucket + Jenkins (per team) + Nexus all behind Citrix and inaccessible from the internet. The organization itself has some hundreds of developers, but as every team has a lot of freedom when it comes to these choices, we decided to try monorepo.

We only structure OUR product into monorepo, not the whole organization. So this was not an organizational choice, but a team choice.

Reasons:

Atomic changes. Pull-requests with changes many places, e.g. contract + producer + consumer + documentation. Before we had to do 4x pull-requests and usually days between which made people have to go back and forth between what was done in the contract + producer when taking QA of consumer code then verify documentation.
Easier to search the code base for examples.
Common code / reuse. Very easy to create common code in monorepo, just make sure to have good conventions. E.g. we used Spring Boot's Auto Configuration in our common libraries and did not make too many new Maven modules, but bundled many "features" into that common library but with transitive dependencies as provided. I'll add an example of this in the GitHub example.
Management of dependencies, e.g. we were very invested in Spring Boot, so new updates usually only happened to applications that had changes, not those "stale" applications. With monorepo, we only updated the parent pom and triggered a full redeploy (usually manually since we don't have that many apps). This made it easy to have the latest and most up to date dependencies all the time.
Changes could be tracked across applications to same commits. Made it easier in fault situations where multiple applications had been changed as part of the same feature.
Large scale refactoring. When we decided to change formatting conventions to IntelliJ default, we did it for the whole product at once. We did other refactoring as we went along, but none that hit that hard in many applications and libraries.
One place for code, tools, contracts and documentation. Yes, we even moved our documentation from Confluence to GitHub with Asciidoc and Asciidoc Maven Plugin to update a GitHub Pages site whenever the documentation was changed. We had to get our non-technical people to learn Asciidoc, they catched on easier than anticipated.

Our situation:

Mostly an independent project in the beginning, so few had dependencies to us. But the end result is a product that basically everyone in our organization has to consume.
Max 12 developers (I think it was).
About 20 different applications within our product.
Very autonomous with a high degree of freedom for our team.
Everything on Kubernetes and everyone could deploy to production.
Trunk-based development. PR to master, deploy to dev and prod if tests go green, can optionally go to just dev if you are on a branch with name dev/*.
We don't share our Java POJO's between other teams since it creates a dependency on all fields in the contract, not just the ones you use. So it's up to the consumers to implement they're own POJO's with just the fields they require. Following tolerant reader pattern, only fail on breaking changes in things you are dependent on. This made us just make a module for all our internal contracts (contract-json / contract-avro) but it will add a bit of an overhead in the build if they get very big. Internally we share our own contracts since all contract changes affects producers and consumers anyway and triggers redeploys.
We changed our test strategy from tests and traditional test pyramid where we had unit-tests on class level, component tests that started the context and integration tests for one inspired by Spotify. Reason for this was that since we were iterating rather fast, we often changed code / refactor, this always leads to people pulling their hair on existing unit-tests, while we now moved to tests more on the level of: input + state = output. E.g. Given, when, then on a more functional level since this was our use-cases within every application. This worked great for us, especially to increase our iteration speed since most changes did not change the functionality, just added more or refactored the code.
We removed usage of external components in our tests, e.g. Embedded Kafka was often used, but this took a lot of time to run and was often error prune (race conditions e.g., we could have fixed it, but we mostly did not get anything out of these tests except for testing Spring Kafka (we just hit the same listener in our regular more functional oriented tests anyway, so only difference was if it was Spring Kafka that did the call to our listener or the test itself)). There was also more need for these kinds of tests when we were new to Kafka, but as we came to be more familiar, these tests never catched faults that would happen in the environment. I wish we didn't add Embedded Kafka to all our apps but just one or two for learning purposes, but you learn as you go.
Almost no manual test (there was some on the front-end).

Tips:

Check out tooling before starting with monorepo! E.g. you don't wish to build the whole project at every change. GitHub Actions supports scoped workflows (e.g. path: apps/app1/*). This saved us, but I have seen people create their own shell scripts that check the git commit log.
Use codeowners file to automatically assign people with specific domain/application knowledge. Codeowners can be scoped to path, e.g.: /apps/app1/* @GitHubUser1. See: GitHub Doc. Also example in the code provided.
Decide on some conventions early, codestyle, formatting, monorepo convention, testing, etc.. We went pretty heavy on Spring Boot and some basic conventions, this made it easier for anyone to jump into any application, even when Spring Boot was too heavy in some cases but this made all code and applications understandable for everybody (of course, had to learn what the app did, but the style, the libraries, e.g. was the same).
I've heard Gradle might be better for Monorepos, Bazel definitely is but that was too big of a leap for us at that time.
If you want integration tests, maybe they can be more of the common nature? Often you use an abstraction on top, e.g. Hibernate, Spring Kafka, Kafka-Client, so instead of re-testing these libraries in every application, you could make one test module per technology using e.g. Testcontainers to test this if you don't feel comfortable not having these tests, e.g. test/kafka-integration module. These would not have to run on every application change, but maybe more ad-hoc or when upgrading 3. party dependencies such as Spring Kafka. I won't cover this here.

Let's get to it:

First decide a convention for the monorepo. In this example, the format will be:

root
 .github
 .tools
 apps
   bar
   baz
   foo
 docs
 libs
   utils-common (changed to base-framework)
   contracts-json

.github

Contains the workflows for GitHub Action and a file for dependabot. Dependabot is awesome.

Explanations of workflows:
We had one per app, I think this can be shortened down, but this works for us.

To build, we use: mvn clean package --projects :bar --also-make --threads=2 --batch-mode
This means, run this workflow for project :bar (only that app), one commit can trigger many workflows, e.g. changes to foo and bar will trigger workflow for both, in parallel.
Next, --also-make or -am is to build all the dependencies you have within your project, so this will build common-utils (changed to base-framework) and contract-json, but not docs or other modules you might have in libs. See documentation here.

We went for threads 2 because at the moment, that's how many CPU cores you have on GitHub Actions.

Batch-mode is to not print every KB downloaded in console when downloading dependencies.

.tools

Contains different tools for your product, e.g. we had a lot of IntelliJ .http files to call our API's. These were also used as examples to other teams on how to call our APIs but mostly for our own usage.

apps

All applications. I have seen examples where people have used components or services etc, but for us, apps were what we went for.

Every app is dependent on libs/common-utils (changed to base-framework) that contains some code that will be enabled/disabled based on what's on class path and properties (through Spring Auto Config).

docs

Contains all the documentation, internal and external documentation. We even included postmortems but had them in another folder that was not published to GitHub Pages and since our repo was private, only we could access them.

We published using gh-pages branch, but it seems this is not accessible to every repository. Read here. The example is still put up.

libs

All the common libraries.

We had multiple common libraries here, e.g. we had contracts that were only relevant for a few applications. These contracts we got as XSD from external providers. So to avoid having all our apps have to build these contracts (if we had put them in a common module), we made own modules for them that were only included as dependencies in the apps that needed them.

Note

We had a module name test, that was for: applications just relevant for tests (e.g. application that creates Kafka topics before all the other apps start when running locally using docker-compose), possibility for integration-tests (as mentioned above, but we never used it for that but we could / will when the need arises).

We also had multiple other . folders, like .docker for all files to run the parts or the whole repo using docker and docker-compose.

Our experience

After migrating most of our code base to monorepo, developers often hated when they had to work on non-monorepo applications. The developer experience was just, for us, better and easier. This is probably not only because of the new structure on our code, but GitHub Actions has solved many of the negative tooling problems that monorepos often have.

There was a high degree of learning and culture that had to be changed, especially since while we were migrating we could not put our regular development on pause which basically split the team somewhat.

We migrated app by app and this was a bit cumbersome for some parts, for example, we had some common libraries or contracts that were on Nexus and still highly used. So when we copied these to the monorepo, we basically had 2 places to maintain them (since some apps still depended on the ones in Nexus and migrated apps on those in monorepo). This could have been solved more elegantly if all dependencies were accessible from the internet, but the Nexus we had was on our internal internet. If it was accessible, we could add a settings.xml file to resolve the dependencies in monorepo or the other way around. If monorepo would be master, we would have to publish the dependencies to GitHub Package Registry. Development on those apps that depended on Nexus was also done behind Citrix, so many walls to climb. We ended up duplicating for a while and it worked fine, but an extra inconvenience.

Overall, the decision to migrate to monorepo was the right one given our situation and the team is happier for it. But many things had to be changed for it to work. E.g. our test strategy, it would have worked not changing it, but then it would be tons of Maven modules if we did it the same way as before.

Edit/Update

It's been a while since this post was made and I'm currently at a new project and customer. They had already used monorepo to some degree before but it was a bit, how can I say it, overused or misused? E.g. there were a bunch of maven modules for every kind of small lib you could imagine.So one maven module could have a few classes and that was it. This, combined with a lack of code stewardship made the codebase somewhat of a nightmare and when I came in, they were already on the way to migrate everything into a new monorepo that used Kubernetes instead of Liberty.
This made me realize that code stewardship and a combined ownership between the team over multiple years is a must. The last place I was (that is described above) have not gotten that far yet, it's just a few years down the line for them so these problems have yet to manifest itself. So, I'm updating this post as a heads up! Luckily, we have a team that is now getting more autonomous, which hopefully will make it easier to have the code ownership and stewardship over time but consider this, as in a monorepo, the broken window theory might apply to the whole repo (instead of just the current if you have many).

Another note, at the new project we use GitLab instead, and at least at the point of writing the original post, GitHub Actions did not have a manual step so we made a convention to create a dev/** branch if we wanted things out in dev (don't know if GitHub has this feature now), but GitLab has the option to create a manual trigger. We use this to enable deployment of applications to any test environment (yes, back to a few of those again, but still not that many) no matter what branch. I kind of like this, it's easier.

Edit/Update

A comment was posted asking for a GitLab example, so I updated the sample based on what we do at the place I am now. This is based on the manual trigger for side branches as well (mentioned above). Another side note, we are migrating a lot of applications and some of them miss a good test suite, so we also have manual triggers for prod deployments on those. This means we can specify which apps are "safe enough" for automatic deploy to prod vs not. Our goal is to have automatic deploy to prod when merging to master, but not all apps are up to it yet.

Appendix

Latest comments (8)

Rakia Ben Sassi • Apr 16 '22

Thanks for sharing your experiences Kenneth! I enjoyed your piece.

khalil la • Mar 16 '22

Hi Kenneth, thanks for your post. I have a plugin (github.com/khalilou88/jnxplus/tree...) that make the same using NX (nx.dev/).

I want to know how you do to mange versioning between apps and libs? I see that you use project.version in app pom.xml? For me the lib change first so I didn't understand that.
Lib change -> lib new version -> app change -> app new version
Thanks

Kenneth Gunnerud • Mar 20 '22

Hello, we don't have versioning between libs. When an application is built, it always builds libs (e.g. trunk) instead of always being dependent on versions. This also means that we don't need tools such as Nexus for libs (Which is really nice tbh).

Example:
App A is changed, triggers build of App A + all its dependencies within the project (including those in libs that it depends on). Meaning, no versioning of libs, just latest.

The only thing that we are somewhat split between, is if all apps should be triggered when libs have been changed or if it depends on if the application needs to be redeployed (depends on what it uses of libs). In maven, it's also possible to use reverse -pl command do trigger build of all reactor modules that are dependent upon.

So, today we do (since apps are stateless):
Libs/** change: trigger ALL applications for build and deploy

Another option would be the reverse -pl command mentioned above
Or, just manually do it (e.g. in GitLab you can get manual button for all apps on libs change).

Never used nx.dev before, but thanks for the link :), I'll take a look.

Another note is, we usually follow trunk-based development and don't release a product out to others (e.g. like ElasticSearch, MongoDB, Flux, Kubernetes etc), so anything like versioning makes little sense (such as SemVer), because it's always just the next version, never a new major version (maybe a new API with a v2 for example, but the applications or libs never go through the traditional SemVer versioning). So, the appliactions just get short-sha plus a pipeline id (just to have a incrementing number).

Siva Thangeswaran • Jul 27 '21

It is informative, is it possible for you to enhance the sample with Docker and GitLab support?

Kenneth Gunnerud • Jul 30 '21

Added sample with GitLab support :) this is a bit psudo as I haven't tested it, but its taken from a working project.

Not sure what you wanted with Docker sample so I did nothing specific for this. At the current place I am, we use Google Jib to build the Docker images that gets deployed to Kubernetes. This is because we run GitLab runners in Kubernetes with Docker In Docker disabled (Dind).

To get Jib up and running, we just add the maven plugin, disable it by default (so jib does not build for e.g. libs, docs, etc) and enable it for apps and some jib props in a maven-incl-jib gitlab job that we extend within each app (app/*/gitlab.yml).

Siva Thangeswaran • Aug 2 '21

Thanks Kenneth, this is useful. And good to know about Jib, will try to explore.

tomb85 • Sep 26 '20

Great article, one thing I did not get - what ensures that when one of the libs changes all dependant apps will be rebuilt?

Kenneth Gunnerud • Sep 27 '20 • Edited

Good question, nothing :O. We usually made 1 change in every app we wanted to update. This was due to sometimes, we wanted a more controlled rollout. For example: app1 is critical, app2 was not, then we made a change (usually in a file called buildtrigger) in app2 that was rolled out in test and production, if all went well, we changed a line in buildtrigger for app1 as well.

For all practical purposes, we usually deployed 10 of the non critical apps on a dependency upgrade, if that went fine, we deployed the rest that was customer facing. But guess this depends on your requirements.
Another thing we also did sometimes, was create a new branch to deploy everything to dev, e.g.: dev/dependency-upgrades which triggered build and deploys only to dev environments. This worked due to a condition on prod deploy (only deploy master to prod). Then we deployed all the apps (including critical ones). If that went fine, we merged to master which re-triggered a deploy to dev + prod (so dev got 2x deploys in our case, one from branch dev/* and one from master). When we merged to master, we usually squashed since sometimes, we ended with some commits to fix some errors that occurred when updating.

One option, would be to create a workflow for root maven pom file that again uses github action workflow or repository dispatch to trigger the individual workflows.

Since we used a lot of spring boot, the dependency updates was not that frequent that it was a big problem for us, but as the project expands to more than our size I can see that alternatives to our solution is wanted.