Posted on Aug 24, 2020 • Originally published at snyk.io

How 4 lines of Java code end up in 518772 lines in production.

#java #security #codequality

A few weeks ago I had the opportunity to give a presentation for the Dutch Java Conference JSpring. The talk was about Java dependency management.

During this talk, I created a simple Spring Boot application and determined the number of lines my java dependencies brought in versus the number of lines I wrote myself. This was to show that your dependencies occupy a large space in your application and need attention too. After the presentation, I published the following tweet that gained a lot of attention and incentivized me to draft this blog post.

Getting the number of lines my Java dependencies bring in

Step1: Generate a Spring Boot application

First of all, I created a Spring Boot application using the Spring Boot Initialzr with web as the only dependency.

Adding Java dependecies with Spring Initialzr
This basically means that my project contains spring-boot-starter-web as its only direct dependency.

Step2: Create a Rest controller

Next, I wrote the simplest 4-line REST-endpoint you could think of:

@RestController
public class Controller {

   @GetMapping("/hello")
   public String hello() {  return "hello"; }
}

Step 3: Build the deployable jar

The jar file that is created out of this Java project is a fat jar containing all the jars the project depends on. This is because Spring Boot basically includes the application server with the application itself.

Step 4: Analyse the jar

The next thing I did was to build a simple application that consumes the created Spring Boot jar. It unpacks the jar file and all of the dependencies it holds. All the class files within the jars are decompiled and the lines of code are counted.

Step 5: Collect the results

Lines of code written: 4
Jars: 33
Class files: 9917
Lines of code: 518772

Some notes on the numbers

The numbers shown above are far from accurate. White lines and lines with a single bracket are omitted so they did not show up in the final count.

It is debatable if we need to analyze these lines better and omit even more, if we want the numbers to be more accurate. In addition, different decompilers will give you different results.

I consider this more of a pseudoscience. Although the numbers are not accurate, it does show the large role dependencies play within our applications. This was my intention.

Take care of your Java dependencies

Ok, the numbers are large, but what does this example actually tell us? First of all, using Spring Boot for a hello world application is overkill. Secondly, Spring Boot is an application server so these large numbers are expected, right?

All these remarks are true. This was the most useless Rest endpoint you could ever write, I agree. However, this emphasizes the fact that dependencies make up for a large percentage of the binary we put into production. As Spring Boot is a framework that is used by tons of Java developers, I deliberately chose it for this example.

Think about it. In this example, the lines of code I wrote is far less than 1% of the total lines of code we push to production. Yet, as a team we are responsible for all of it. If something goes wrong with the application the development team needs to solve it. Having things like code reviewing, pair-programming and automatic static analysis in place definitely helps creating a better codebase. But what about the 99% of the code we use and depend on heavily.

Taking care of your dependencies and upgrading them on time is essential. Importing one package results in many dependencies and many possible attack factors. Creating a java dependency management strategy on how dependencies get picked, updated, and removed is something every development team should consider. From a security perspective, you should at least be warned if one of your dependencies or transitive dependencies has a vulnerability.

Scanning your application using Snyk to detect vulnerabilities in your open source dependencies can help you with this. As Snyk can be used in every step of your Software Development Life Cycle (SDLC) and is highly adaptable, it is perfect for automation. Snyk does not only detect vulnerabilities—it helps detect vulnerabilities as early as possible.

Conclusion

Taking good care of our code is not always enough—we should also keep an eye on our dependencies as they play a huge role in our applications.

By scanning your dependencies with tools like Snyk, vulnerabilities in your dependencies will not slip in unnoticed. Furthermore, implementing a solid java dependency management strategy is crucial in keeping your application maintainable, scalable, predictable, and, most importantly, safe!

Top comments (12)

Michiel Hendriks • Aug 25 '20 • Edited

Problem with many of those Spring Boot Starter is that they include a lot of dependencies you might not even use. It gets even worse when you also use an alternative start package. For example spring-boot-starter-jetty next to your spring-boot-starter-web. If you don't exclude spring-boot-starter-tomcat from spring-boot-starter-web you have even more dependency cruft.

The way Spring organized the Spring Boot packages you basically have two huge packages which have a lot of optional dependencies which get auto-magically activated when you have an other dependency included (this is all in spring-boot-autoconfigure). Those starter packages don't contain any code. If spring-boot-autoconfigure was split up in small packages for each supported library, then it would end result would be much smaller.

Another popular obese library is Google's Guava. As a library it's poorly designed, and also quite a dependency hell with their really active removal of deprecated features. The library itself is a massive 2.7MiB. In most cases people only use a really small subset of this library.

David Dal Busco • Aug 24 '20

Your article made me think "npm, maven, same same 😅". Interesting to notice that regardless if front- or backend, somehow and to some extension, the same problematic might be faced.

Thank you for sharing the experiment.

P.S.: The Url of Snyk in "Scanning your application using Snyk..." is not valid, you might want to correct it, just in case 😉.

Jing Xue • Aug 24 '20

Your conclusion is perfectly valid, but I'm not sure your experiment really illustrates it.

If you statically decompile all the dependencies, you end up with essentially the entire codebase of all of them, but I'll bet only a small portion of it would actually be loaded into the memory and a fraction of that would be actually executed. And that is true not only for your placeholder app, but for most typical applications today as well, because libraries tend to declare all the optional or feature specific dependencies. From a library developer's point of view, that is a sensible approach, because they would rather a user waste some disk space than they run into errors out of box due to missing dependencies.

Runtime coverage would probably give you a more accurate picture of how much of the dependencies is actually relevant.

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 • Aug 25 '20 • Edited

Hi Jing,

I see what you mean and I understand your point of view. When using a large framework like Spring-Boot do you actually know what is in memory? Many things are available by default and changing a single property or adding a single line of code might a domino effect. Not even mentioning the use of reflection in Spring.

In addition, although a class may not yet be loaded in memory, the class is available and can be loaded on demand. As a developer, you should be aware of this as you are responsible for the complete binary.

Small example: (and this use case happened before) if a vulnerability is found over time that let me inject SpEL (e.g. by adding a specific malicious header) to an existing endpoint. All the classes are available to me.
The same holds for deserialization / unmarshalling problems and potential code injection.
Vulns are found over time, unfortunately (but hopefully will not).

In this example you are right. Nothing much happens and no vulns are yet found. However, adding a bit of code and / or another library might already change this.
I also can go into dependency hell and the possible collisions that can come up in underlying libraries that need a specific version but I believe you get my point.

Thanks for reading the article and thanks for your comment, I honestly appreciate it.

Dave • Aug 25 '20 • Edited

So, your analysis tells you the contents of the fat jar... it's called a fat jar for a reason (and there's things we can do to minimise it's size - including but not limited to the use of modules & jlink, or graalvm etc).

The fact that we can optimise the fat jar for file size, should tell you that not all of the fat jar is loaded into memory, but does point out one concern with Spring. Spring makes an awful lot of assumptions - it's a very opinionated framework, and unfortunately, there's no way for Spring, at build time, to know what dependencies you will want to use, so it gives you everything.

This "feature" of Spring is both what makes it so popular, and so risky. A Junior will love Spring for the convenience, but Seniors and above are often wary of it - not because of the jar file size, but because of the potential attack profile (did someone set an environment variable in Prod exposing too many metrics to the outside world?)

My question: why do you care about the size of the jar? I mean, sure, if you're deploying in any one of the cloud platforms, scalability time is impacted by network transfer time. But why, in this demo, did you care? What point were you trying to make, when anyone experienced with Spring knows that the Jar files are huge by default?

Would it be more productive, in that demo session, to respond with "hey, jar files are huge, but here's one way we can minimise their size..."

I can hear the Spring devs in their office now: "Some guy just proved to the internet that we remove boilerplate and let him just write the business code he needs! Awesome."

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 • Aug 25 '20

Spring is not the issue, I honestly love spring-boot!

The issue is knowing what you are using and why
Being aware that by blindly importing dependencies, there is a risk.
You are responsible for not only the code you write, both from a security point of view and maintenance.

I'm sorry if that was not clear to you from the article.

Dave • Aug 25 '20

I'm honestly going off Spring, and have been for the last few years - mostly because of the opinionation issue.

The security aspect can be somewhat mitigated by having your own dependency repository hosted internally, with appropriate security controls, but the upgrade path (and trusting 3rd party developers both in terms of security and bugs) is troublesome no-matter what.

That said, blind dependency importing & monitoring changes isn't unique to Spring, hell, with Maven I can write custom plugins to jump into the build phases & inject whatever code I want.

Ultimately, measuring the fat jar size is probably not the best way to illustrate the number of dependencies - many of them will be in the jar, but never executed because they're not referenced from any other code - they're just floatsam & jetsam.

Maybe a better way would be to spin up something like SonarQube with an appropriate rule set (not the default), write 99.999% test coverage, and then look at the SonarQube report to see if it flags up issues - since one thing it does do, is an OWASP scan).

Jan Wedel • Aug 30 '20

This is an interesting experiment you did there and yes it shows some downsides. We also use something similar to Snyk to manage our dependencies. This actually an important point also for JS/npm or other tools.

But it’s a trade off as always. You could mentioned a couple of the positive aspects as well:

Reduced boiler plate code
Reduced risk of security issues in boiler plate code. I saw people configuring servers and HTTP clients wrong so often.
Improved development speed
A production ready application with HTTP server, monitoring endpoints. If you create all that code by hand in Java or any other language, you’ll create a buggy and vastly insecure application for sure.
The fact that we use code developed by others is actually also a pro itself because it’s open source, developed by people collectively much smarter then we are and tested by millions in production.

That said, I can sleep very well with the 518772 lines 😉

Sergiy Yevtushenko • Aug 24 '20

Spring should not be used not just for such a simple app, but in general.

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 • Aug 24 '20

I think you are missing the point here.
This is merely an example to illustrate awareness.

Sergiy Yevtushenko • Aug 24 '20

Well, I very well realize how much code we're adding with dependencies. My comment just expands "using Spring Boot for a hello world application is overkill" sentence from your post.

John Mercier • Aug 26 '20

This is an awesome idea! Thanks for sharing. I agree that the example is overkill but I also believe it points to a problem with modularity. It would be interesting to know if the spring dependencies are bloated or if it is tomcat. Could the modules be broken down further so they are not so big for trivial applications like this?

I think addressing this issue is one goal for Java Modules and building applications with jlink.

Some may not see the jar size as a big deal but as rates of deployments increase I can see it being a problem. I imagine deploying 10 times a day is not big deal but deploying millions of times a day is.

View full discussion (12 comments)