loading...

Java may be verbose, but who cares?

danlebrero profile image Dan Lebrero Originally published at labs.ig.com ・3 min read

This article originally appeared on IG's blog

After more than 15 years of Java experience, I have tended to brush aside comments about Java's verbosity with one of the following arguments:

  • lines of code (LOC) is a bogus metric;
  • IDEs generate 90% of my Java code;
  • lessons learned from PERL's notorious and incomprehensible conciseness​.

LOC metrics are simply not important.

Or are they?

Some time ago we started building our first Spark jobs. The first two that we wrote where basically the same:

  1. Read a CSV file from HDFS
  2. Transform each line to JSON
  3. Push each JSON to Kafka

It so happened that we wrote one in Clojure and one in Java. When we reviewed the code, this is how the Java version looked:

Java classed picture

At first I was surprised that there were so many classes.

Then I was surprised that I was surprised about finding so many classes. After all, the code was the perfectly idiomatic Java code that we all have come to write.

But why was I surprised in the first place? Probably because the Clojure version looked like this:

Clojure file picture

But maybe it was one huge file with hundreds of lines of code? No. Just 58 lines of code.

Perhaps the Clojure version was a completely unreadable gibberish of magic variables and parentheses all over the place? Here is the main transformation logic between the two versions:

compare

The only difference in readability is that the Java version has a lot more parentheses.

The code ​revi​ew

I usually would not pay attention to Java's verbosity, but during the Java code review I found myself thinking about:

  • Which class should I start with?
  • Which class should I go next?
  • How the classes fit together?
  • How the dependency graph looks like?
  • Who implements those interfaces? Are they necessary?
  • What are the responsibilities of each class?
  • Where is the data transformation?
  • Is the data transformation correct?​

While the Clojure code review was about:

  • Is the data transformation correct? ​

This made me realize that the Clojure version was way simpler to understand, and that the fact of having a single file with 58 lines of code was a very important reason for it.

What about b​​​igger projects?

I don't have any bigger project where the requirements where exactly the same as in here, but it is true that our Clojure micro-services have no more than 10 files, usually 3 or 4, while the simplest of our Java micro-service has several dozens.

And from experience, we know that the time to understand a codebase with 4 small classes is not the same as understanding one with 50 classes.

Incidental Complexity

So given that the inherent complexity of the problem is the same, and that the Clojure version is able to express the solution in 58 lines of code while the Java version require 441 lines of code, and that the Clojure version is easier to understand, what are those extra 383 (87% of the codebase) lines of perfectly idiomatic Java code about?

The answer is that all those extra lines of code fall into the incidental complexity bucket - that complexity that we (programmers) create ourselves by not using the right tools, complexity that our business paid us to create, pays us to maintain, but never really ever asked for.

Are lines of code important? Not as a measure of productivity, but certainly as a measure of complexity, especially if this complexity is incidental instead of inherent.

Imagine deleting 87% of all the code that you have to maintain!

Posted on Jun 12 '17 by:

danlebrero profile

Dan Lebrero

@danlebrero

Technical architect with more than 15 years of software development experience. A long time Java practitioner, he now also loves ().

Discussion

markdown guide
 

The LOC is often used to measure complexity, and it is often right. Not for comparing 2 programs with similar LOC, but as long as the number vary greatly.

The conclusion then is to use languages and libraries that require less LOC. That's kind of obvious, isn't it?

Well it is more complex than that, I am afraid. First comparing Java to Clojure or whatever, you must be honest. Do you have to be that verbose in java ? Really ?

You can use public fields for POJO or project lomboc. You can now use functional programming with quite interresting collections libraries like google guava or eclipse collection. You can now use lambda expression.

You can ensure to make your API dry, easy to use and avoid complex design pattern that are most often not necessary.

Just doing that and you'll end-up with far smaller program than before. This is not perfect, but it is far better.

And if you do that, you still have your IDE at hands, you can still find instantly all the place in your code where this function is used, you can jump instantly to a class/function definition, you can instantly refactor your code... Not bad.

A key point also is that code is not about doing things for yourself. This is a shared effort. Many people work on the same software. Extend it, maintain it, evolve it. So programing languages, APIs, design pattern and coding style become a shared culture. Things that your team and the whole community understand and share so they can instantly understand what going on.

The best things about a nice language or API used by everybody in your team for years is that it is shared knowledge, shared way of thinking, shared culture. People think naturally in it. So there even greater insentive to use the best programming language and API, isn't it?

Well not so sure. If you can get your whole team and a good share of your company to master your language and API of choice, that great. But this may not be easy. For any decently sized existing team/company, it mean that if your choice isn't already popular is that many people will have no idea of that clojure or scala or haskell or whatever.

You'll have to initiate culture change. And most people will not want to change at the begining. They not necessarilly be a natural. So they may need 1 year or more to adapt. Even if you control hiring, you may fail to convince the few available experts ready to work for you to relocate in your area and accept to work for you.

This isn't easy. You should try to broaden the shared culture in your company, and you should ensure that newcomers are up to speed as fast as possible by spending time on training and so on. But if you go very far like using a language almost nobody use (whatever the reason), it increase the cost dramatically.

 

Hi Nocolas!

Nice to hear from you again :).

Just to make it clear, it is not LOC. It is LOC + readability. We can all create really small programs that we cannot understand 10 mins later.

In all honesty, I consider the Java version to be idiomatic. Note that I have been doing Java for 17 years.

Right now idiomatic Java is Java 8 with streams and method references using Lombok annotations for any data class. We use guava extensively, but I would not consider that to be necessarily idiomatic.

On this particular example, both the Java and Clojure version are in fact using the HDFS and Spark API to do the work. The same one, both of them.

I personally, from the clojurian side of my brain, consider the interfaces in this particular example to be unnecessary. But my Java side is not surprised to find them.

In fact, those interfaces were created because of design patterns and coding style that have become part of the shared Java culture.

Change is scary and more if you are in a comfortable position. Going back to being novice is extremely hard.

About finding experts, you have the same problem in any language. Even if there are more Java experts, there are also more companies looking for Java experts.

Also, given that you don't need to deal with all the incidental complexity, what if your company could do the same amount of work with half the people? That will decrease the costs dramatically.

People that learn non mainstream languages, usually do it in their free time. They are interested on their craft and profession. I want to hire and work with those guys.

That it is not to say that mainstream devs do not care. It is just more likely.

My experience is that a graduate is able to learn and be productive in Clojure within one month.

About the IDE, you don't need one when the whole the program fits in one screen ;).

About Clojure adoption in particular, blog.cognitect.com/blog/2015/9/21/... or if you believe on ThoughtWorks tech radar: thoughtworks.com/es/radar/language...

Thanks a lot for the comments!

Daniel

 

Well you see again that the culture. The language play a role, but the culture arround it is maybe more important.

As for the program fiting the screen. Well I be honest. If your clojure program fit the screen, it is likely something trivial... Very fast even with the most concise programming language in the world you end up with dozen thousand, hundred of thousand loc.

Our new project, Java8 and all is currently about 45000 LOC. Without the comments, the imports, the unit tests, the generated code or methods declaration. Just the executable LOC as seen by sonar. The real code is likely more about 200K.

Maybe in clojure it would be 50K. That would be huge change. But I would still have hundred of files, and I'd still want to be able to extract methods, rename a method in 5 seconds everywhere, find all occurance some piece of code is used with the code hierarchy and all...

The code would still be too big for it to fully fit in my brain, and I would still not know by heart the code done by others.

So I would still hugely benefit from reliable code completion, expected types and all to be directly visible at my finger prints.

I still would benefits of imports that just work without me spending a second on it.

And clojure is not always better. Mutability is valuable too. And Clojure sucks at it.

So you see even I love the beauty of clojure, this isn't as simple as that. In particular as I agree clojure is signicantly smaller ligther for code, but there other costs.

When the program really grows you have to find solutions that just don't comes from a bit more readability, in particular if it negativelly impact your hability to understand the system at a wide scale (thanks to the tooling mostly).

Hey Nicolas,

I cannot really comment. In the past 10 years, I have always worked in some kind of service oriented architecture. I don't dare to call it microservices as it is still not clear to me how micro is micro.

On such architectures, the biggest codebase that I have worked with was 20k lines of Java code.

If the code growed larger than that, it would be a symptom of too much complexity in the same place and we will split it.

We found it easier to work with small lightly coupled processes, than with one big monolith.

We have right now more than a 100 of such services.

My only experience with large codebases is that nobody wants to touch them, mostly because they have grow to be a monstrosity of spaghetti code.

Why none of those codebases is composed of small, decoupled components with clear interfaces and boundaries? If we had such clean codebases, the need to make a change/refactor across component boundaries should rarely exist, hence the need for sophisticated tools would be less.

What if we could build such clean programs that tooling was not necessary? Wouldn't that be better?

I agree mutability has its value. No program would be useful without it.

But IMHO immutability has more value.

Thank god, Clojure doesn't make it idiomatic to use mutability, which means that writing mutable code requires an extra effort. On the other hand, Java requires you to make and extra effort when writing immutable code.

Do I need to sell you immutability over mutability? What you mention seems like a strength on Clojure's side.

Thanks again for your comments and for the civilized tone. It is a pleasure talking with you.

Cheers,

Daniel

To be honest micro service is a special case of modularity where the boudaries of a module are defined by network interfaces. The same core design of having separated components with low coupling still applies.

But the complexity of having 100 components each with 20K LOC isn't generally the same as one component with 20KLOC. After all the total number of lines of code is 2 million LOC !

If the design is good, some components never interract with each other, and that the consequence of a good architecture. So this decrease the overall cost.

But if for a typicall application/product/whatever that has a set of features, you really need say 10 components with 20KLOC each and theses 10 components have nothing in common with the other 90 components, you can say you have 200K LOC for that application/product/whatever. It may not be fair to consider 2 millions as the real number but 20K isn't necessarily the truth neither.

I wouldn't put the unit of an application/product/whatever necessarily at the service or component level except if that service/component is totally isolated without any interractions with other services/components.

But we have also service there. Of course. Thousand of them. And many have lot of component inside. We just operate on a bigger scale.

What I can see from experience is that sometime some data is flowing accross many services. And when that need to be updated. Just maybe a new tag, in reality often a bit more complex than that of course. The total cost of having that new data handled correctly among all the services is huge.

On mature software, a team may end up doing just that adding a few more infor here and there rather than lot of new thing. But the cost of adding a piece of data to be propagated and handled correctly among many service is huge.

The coupling is low, but not zero. And the cost of maintenance and evolution still grows with the number of services or components.

Also this change the way things are designed. Time is spent analysing message exchanges, their orchestration and so own with associated documentation and impact. This shift where the complexity is, but the complexity, ultimately is still there.

Hi Nicolas,

Thanks a lot for the thoughtful answer.

We also have the problem that sometimes when adding a new field or adding a new enum value, we need to change several services.

In my experience this usually happens because we autogenerate our Java classes from some schema, which makes our classes too rigid.

I agree that microservices are more complex, but also more flexible.

Would building microservices inside a monolith give us the best of both worlds?

To me, these no silver bullet. Micro-service is just a way to componentize an application/product. It is great for some cases, terrible for others.

I would not consider microservices to componentize the plugins of a image editing software on the desktop as an example ;)

To me using micro services, fat service, a single service or no service at all, network wise is not really important. What is important is that your components are well componentized.

Example: You or me may say that service is XXX LOCs. But we do not include the JVM code. Neither do we include apache-common libs or spring, or the application server (or netty) code. If we use clojure we do not include that. And we consider only the real source code, not the compiled code...

This is because we manage to extremely well componentize theses components. The abstraction they provide is so great you never have to look inside how it work. You can, it has value, but you don't have to.

Often even using the network there more coupling that we think between components. The format of data we use while in JSON or XML is far less generic than we think because if we try to incomporate the data from another provider we may miss some concept entirely and the exchange format has to be reviewed.

Theses things are hard. Networks service allows to have thing on different processes/computer and that's nice. The resiliant format like JSON/XML help on adding new features without breaking existing code. As long as clients are smart enough to ignore what they don't know in the message.

But you can achieve with services defined as interface in java (or clojure). The idea are quite simple in the end. The input is created by the client and the one that receive it do no modify it. The output is created by the service and that service ensure there no dependency to its internal state that would create issues.

The service doesn't take parameters as adding more break client code and there a narrow limit of what you can pass but use more POJOs or clojure maps. In both cases, it is possible to extends the datastructure and consider you have default values for things that are missing (with pojo, the default value can be always present).

Debugging inside the same process and performing integration tests is much easier and faster and you remove entirely the need for serialization/deserialization and you don't have to manage the various network errors...

Don't get me wrong you'll not want necessarily everything in the same VM neither. It depends of the problem you are trying to solve. A good architecture would bring natural boundaries and some of theses boudaries would be the network.

But the network itself bring lot of complexities. I can see that clearly. In my company we have thousand and thousand services. Many of theses services are what you would call large codebases and we have many farms of servers. Fast it become complex to understand what the right service to call, what dependent service will be impacted when you want to add an new feature, how to scale and keep the latency low, how to scale the network itself...

One day or another, the comforts of the abstraction that protected us for years come back to bite us ;) There hundred people in my company working on the enterprize bus, helping the configuration and maintenances of all services for various clients and so on.

Hi Nicolas,

All very true.

The point that I was trying to make with the micro services is that I have found that they give you better components, I think because it is usually another team the one that produces them and several teams consume them, hence it forces for a more isolated and thoughtful design, it forces to make things backwards compatible and it also makes the boundaries obvious.

I do not think they make things simpler. As you say, the network is a huge headache on its own, but I think they force us to follow good practices.

On the other hand, with monoliths I find it easier to put some hack, break encapsulation and to end up with a big ball of mud.

Of course, I have also seen the death star from Netflix :)

Thanks a lot!

Dan

I agree that network services on a resiliant exchange format (like XML, protobuff, Json...) greatly help on the backward compatible aspect and if you are serious about the doc, versioning and all it great way to isolate a component.

I always found componentization to be extremely hard to achieve. A web service can be very fast become non backward compatible, expose proprietary format or a structure of data that is incomplete or not future proof. The cost of maintenance is then huge.

When you are in the same process, the issues are far easier to solve but also are far easier to make and once there too many, nobody can manage to remove them. This is because languages at least like C, java or clojure are not really solving modularity issues.

Java 9 is going to give a try to a module system and OSGI has been there for quite some time in the eclipse echosystem. But that last one is not easy to use.

What we do in our project there is that each module in their simplest form have 3 maven modules: an interface module, an implementation module and an aggregator module that depends on both: one with the standard compile scope, the second with the runtime scope.

Clients code depend on the aggregator. They perfectly see the public interface but can't statically reference the implementation module as it is not included in the classpath at compilation time.

Linking is done either by spring (typically with annotations) or for components that are not expected to be bound to spring, by a singleton in the interface that access the implementation by introspection. Typically in a ServiceLocator or equivalent way.

The implementation can implement any interface, have any kind of public/protected visibility and clients can't abuse that without doing it on purpose (moving the class, making the access public or adding a compile time dependency to impl). Go to separate git (for a group of modules arround a feature likely) and it become harder to go unnoticed.

What we still miss but I guess could be added is a build failure in maven if you try for another scope than runtime on an "impl" module.

I worked with the OSGi for four years back in the early days and it is one of the best frameworks to force you to think about how to componetize your application.

But the most useful lesson was to think about how your system should behave if one of the component was not present or was being restarted/upgraded.

It is an experience that translated quite nicely to the micro services world and that still helps me every time that we build a new system. Somebody should write a book called "What happens if this dependency is not available?".

What the OSGi did not taught me was that the network is a PITA. I learned that latter :)

Thanks!

Dan

I just wanted to say it was a pleasure to watch your discussion guys haha

 

LOC is not a bogus metric, it just depends on what one is measuring. If you're trying to measure code review and maintenance cost then LOC is definitely a valid metric. More lines equals more effort.

 

Even between different languages?

My Java-wired-brain has said hundreds of times that it was not.

I have changed my mind.

What about you?

Do you optimize Leaf for succinctness? Does that impact readability?

 

Yes, I even think so comparing across a language. If given the same functionaly equivalent solution in two languages, including error processing, then I would say the one with lower LOC* is of higher value.

Readability is an issue, but I find syntax redundancy is the biggest problem in readability. Less code is simply less to understand, even if it involves more complex operations.

Definitely for Leaf I'll be optimizing for succinctness, but I'm not in favour of complex syntactic structures. I also have inferred and implicit typing, which removes a lot of syntax (it can look like a dynamic language without being one)

(*We might need to be a bit careful in defining LOC though, perhaps just total functional code-size is better, to avoid packed lines somehow getting an undeserved higher score)

Very good point about LOC.

Best of luck with Leaf!

 

More lines equals more effort.

Ah! Not necessarily always true.
Code is written for other people to read, so it needs to be expressive, which may mean more lines sometimes.

 

Agreed: varied comments, personal style, and syntax can affect LOC without increasing complexity, so it's silly to make a big deal about tiny differences in program length. A 400-line program probably has the same complexity a 300-line alternative, so saving a few lines probably isn't worth making a program illegible (I'm glaring at you, RegEx). At some point, however, a big difference in size indicates a big increase in complexity. For example, a 1,000,000-line program is definitely more complex than a 100-line one.


relative complexity = round (program A LOC / program B LOC)

Measuring complexity reminds me of Fermi Problems. It's really hard to quantify accurately, so I have to make squishy comparisons based on educated guesses. My rule of thumb: if a program is more than twice as big as another, then it's significantly more complex.

Thus, if switching to a new language would cut my program in half (or more), then it's worth considering.

 

Code is written for other people to read

And this is exactly why having more lines always means more effort and more troubles for every reader of this code.

 

It's curious that some of your conclusions were actually explained by Clojure's creator Rick Hickey: infoq.com/presentations/Simple-Mad...

As a Vim user, I've always been wary of languages which require an IDE to make be able to make even the smallest amount of development on a project. Also, I always found Java to be overly structured with too many layers of abstractions that prevented me from fully understanding what was happening.

And the point is this: not only Java is extremely verbose and requires an IDE to write the code for you - if you want to keep your sanity, that is - it is also very complex, hence hard to understand. So while the first problem might be ignored, the second one is a real concern.

The problem that you have is not that the code is ~440 lines long, but rather that you need 7 files/entities to complete a quite trivial task. What Clojure is telling you here is: you need a couple of simple functions tops. What justifies the added complexity of Java?

 

Good point about IDEs. I always thought that they were mandatory, who wouldn't want to use them?

They seemed like a positive thing: "my language has a better IDE than yours, hence my language is better".

I still think that the better tools the merrier, but it is more important to need less tools.

Thanks a lot for your comments!

 

The thing about boilerplate code, which a lot (most?) of that excess is, is that you can ignore it.

But then there's all this code in your app you're ignoring. The language encourages you to ignore the code that you're working in.

There's another aspect to this: It's not just your code you're ignoring. The odds are you're building upon layer after layer of similarly boilerplate-ridden code. This is fine, until you have to figure out what's actually going on.

In Clojure, if you're calling a function, the odds are good you can figure it right out from that. Every now and again I find that there's one layer beneath the one you want—and these days I find that irritating and excessive, never mind the half-dozen, dozen or more layers you may have to dig down in the object-soup. (Digging soup layers? Mixed metaphor felony!)

 

Is true LOC matter, but this example is not fair, for data pipelines Functional Programming must be used, is just a better choice.

On the other sire I would like more LOC instead a 2 hackish liner pythonish code which is hard to grasp,test and refactor.

 

Java has a lot of verbosity; and as languages like Kotlin & Scala prove quite a bit of it is unnecessary given progress with e.g. type inference and other language constructs that are easier to deal with without compromising type safety. I'm currently transitioning from Java to Kotlin. It's not a massive change but nice nevertheless.

The nice thing is that unlike with other languages, this is not a throwback to the stone ages in terms of tools, frameworks, and practices. I get to keep all of that and get more compact code. For example, refactoring without technical support is more of an aspirational thing. Any superficial inspection of any ruby, javascript, or python software will show you that. I've facepalmed my way through quite a few such projects. There's a reason not a lot of Ruby & javascript code survives its first anniversary. With proper technical support, refactoring is no brainer. My IDE tells me when there is a problem instead of my program failing to run, I know before I finish typing that it will compile. This excludes a very wide category of bugs from ever happening. This is why Java became so popular.

Now other languages are finally catching up and there are even half decent IDEs appearing that are not written in Java. The fact that that was unthinkable for close to 20 years tells you all you need to know about Java.

 

I cannot agree more. I have this project, which queue job to Redis.

github.com/yeospace/cidekiq

It's very short and simple.

I realize that functional programming remove lots of abstract and focus only on data transformation. I no longer has to constantly thingking where do I keep these methods or properties. Whenever I refactor, separate class/method then the state go together, and I don't know where is a good place. Sometimes it's hard.

The more abstraction we added, the worse it's since all class started to do only one thing wrap around their state. At that time, I feel like why not remove all crap, and expose only a single function that do onething, and state is passed into function.

 

And I couldn't agree more with your comments.

FP feels somehow liberating, but ask me again in ten years time ;)

Thanks for the comment,

Dan

 

Hey Dan,

Great post!

Years ago I was tired of people complaining about "so many parentheses". So I did some simple analysis of Java, JavaScript, and Lisp.

It turned out Java and JavaScript had more. Fewer per line, but more in total for the same functionality.

Then I analyzed other "brace" characters like square braces and curly braces. If you add all of those in, Java and JavaScript went through the roof.

It's unfortunate, but I think people who complain about # of parentheses are pointing at something real but don't know how to articulate it. Clearly, the # is not what's important. Perhaps it's the density? Just about every line of Clojure has at least one. Some lines close 5-10 at the end.

It could also be the uniformity. Most everything is done with parens in Lisp, so if you don't know what that first thing in function position is, it looks like a soup of parens.

Finally, it could be that people are used to braces being really meaningful syntax in their language. C-style languages use curly braces for blocks, parens for function calls, and squares for subscripts. If you are used to seeing those things as guideposts through the code, you'll be disappointed in a Lisp. Your intuition will misguide you and it will just look like a mess, as any foreign syntax would.

Rock on!
Eric

 

Hi Eric,

Very interesting your thoughts about paren density and meanful syntax. Clojure's use of vectors help a lot on both.

I have given up on the parens discussion, as I have been on the other side of the fence and no amount of reasoning would convince me.

Nowadays I just brush the discussion aside with "you will get over it and you will love it. What other concerns do you have with Clojure?"

Plenty of times the discussion ends there, and one less convert for the Church of Lisp.

Thanks a lot for the comments!

Dan

 

Nice comparison, I could say the same about Ruby code as well :).

 
 

Wow, awesome conversations in the comments. As interesting as the article!

 

Closures main problem is line based diffs vs. form based diffs in popular Code Review apps like github.

 

If you are having trouble making the switch, I would recommend to build some side project, open source and blog about it. That will give you a very good example of your craft.

Best luck!

 

"The only difference in readability is that the Java version has a lot more parentheses."

Love that one (isn't Clojure a dialect of Lisp ?)
:)

 

That joke, that in this case happens to be true, was aimed to Clojure/Lisp developers. Glad you enjoyed it :)