DEV Community

dpsutton
dpsutton

Posted on

Logging in Clojure: jar tidiness

This article will work through an easy to encounter yet easy to avoid pitfall that can occur with clojure.core.logging, the de facto logging facade for Clojure projects. I'll build up an example project that will mimic a real world scenario brought down to just its basics.

The gist is that logging time (even below the active threshold of logging) will be increased by two orders of magnitude, causing a form that takes 379 ms to evaluate to take over 19.5 seconds just by adding a jar to the classpath. The form is a bit contrived, but it does simulate arbitrary "work" and how logging could tremendously slow down your code if you forget how your logger was configured (perhaps several years ago).

tl;dr
  • specify your clojure.tools.logging implementation. It might drift over time from what you originally thought it was
  • watch your deps. It is very easy to accumulate more than you think

The setup

Let's setup a basic project that includes a logger. I think the nicest way to get a new project up is with one of the tools that Borkdude has created, called neil. If you have this installed, just run the following in an empty directory:

neil add build
Enter fullscreen mode Exit fullscreen mode

This gives us a great template to start our example. If you don't have neil installed (and you should), just create the following deps.edn:

{:deps {}
 :aliases {
 :build ;; added by neil
 {:deps {io.github.clojure/tools.build {:git/tag "v0.8.3" :git/sha "0d20256"}}
  :ns-default build}}}
Enter fullscreen mode Exit fullscreen mode

Find a log4j2 tutorial or copy the config from your last project, and we have the following state:

{:deps {org.clojure/tools.logging           {:mvn/version "1.2.4"}
        org.apache.logging.log4j/log4j-api  {:mvn/version "2.18.0"}
        org.apache.logging.log4j/log4j-core {:mvn/version "2.18.0"}}
 :aliases {
           :build ;; added by neil
           {:deps       {io.github.clojure/tools.build {:git/tag "v0.8.3" :git/sha "0d20256"}}
            :ns-default build}}}
Enter fullscreen mode Exit fullscreen mode

Make a src directory and paste the following in src/log4j2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
  <Appenders>
    <Console name="Console">
      <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
    </Console>
  </Appenders>
  <Loggers>
    <Root level="WARN">
      <AppenderRef ref="Console"/>
    </Root>
  </Loggers>
</Configuration>
Enter fullscreen mode Exit fullscreen mode

Creating a jar

Let's create a jar:

$ clj -T:build uber
$ cp target/lib1-1.2.1-standalone.jar log4j2-uber.jar
Enter fullscreen mode Exit fullscreen mode

I'm going to leave tooling out of this and run everything through a repl on the command line right from the jar. One other thing I want to do is include the incredible criterium library so we can profile. I'm deliberately including criterium separately like this because you shouldn't have a dev-time tool like criterium in an uberjar. And knowing how to easily combine other jars with your real production jar can be very helpful. I grabbed the jar from my .m2 cache.

$ java -cp log4j2-uber.jar:criterium.jar clojure.main
user=> (require '[clojure.tools.logging :as log])
nil
user=> (time (reduce (fn [acc x] (log/info "accumulating") (+ acc x)) 0 (range 10000000)))
"Elapsed time: 447.684167 msecs"
49999995000000
user=> 
Enter fullscreen mode Exit fullscreen mode

And now we've simulated some work that takes about half a second on this run.

The change

Now suppose work requirements come up and we need to deal with sparksql. It's not particularly relevant, but you can imagine any particularly large jar, or maybe just the accumulation of jars over time. Let's add org.apache.hive/hive-jdbc {:mvn/version "3.1.3"} to our :deps map.

;; deps.edn with the hive-jdbc driver
{:deps {org.clojure/tools.logging           {:mvn/version "1.2.4"}
        org.apache.logging.log4j/log4j-api  {:mvn/version "2.18.0"}
        org.apache.logging.log4j/log4j-core {:mvn/version "2.18.0"}

        org.apache.hive/hive-jdbc {:mvn/version "3.1.3"}}
 :aliases {
           :build ;; added by neil
           {:deps       {io.github.clojure/tools.build {:git/tag "v0.8.3" :git/sha "0d20256"}}
            :ns-default build}}}
Enter fullscreen mode Exit fullscreen mode

Changes required to build

This isn't particularly relevant to our build but we do have to work around the way tools.build works. The lib tools.build is outstanding and quite helpful. It takes a bit of boiler plate but we get simple programs that make our jars. The way it creates jars is by copying all of the classfiles from the jars we depend on to make the one uberjar. I'm writing this post on OSx which is not case sensitive. Jar files are zip files which are case sensitive. And this leads to a problem: a jar file can easily include foo/bar.txt and a file FOO; but my case insensitive filesystem cannot have these two things coexist. From its perspective, I want to have a file named foo, and a folder also named foo (regardless that we know its uppercase). We need to exclude these files so we don't get into this unrepresentable state.

We update the build.clj file that neil created for us by adding some :exclude entries to prevent us from getting into this situation. See the doc-string of b/uber for more documentation.

(defn uber [_]
  (clean nil)
  (b/copy-dir {:src-dirs ["src" "resources"]
               :target-dir class-dir})
  (b/compile-clj {:basis basis
                  :src-dirs ["src"]
                  :class-dir class-dir})
  (b/uber {:class-dir class-dir
           :exclude ["META-INF/license/.*" "license/LICENSE.*" "license.*"]
           :uber-file uber-file
           :basis basis
           :main 'clojure.main}))
Enter fullscreen mode Exit fullscreen mode

New jar with hive

Again, it's not particularly important that the jar is the hive-jdbc jar, but its representative of state that can accumulate over time in a project, and even from a single jar. Let's build the jar and run the same form we did earlier.

$ clj -T:build uber # this can take a little bit, its a big jar
$ cp target/lib1-1.2.1-standalone.jar with-hive.jar
$ java "$(socket-repl 6001)" -cp with-hive.jar:criterium.jar clojure.main
Clojure 1.11.1
user=> (require '[clojure.tools.logging :as log])
nil
user=> (time (reduce (fn [acc x] (log/info "accumulating") (+ acc x)) 0 (range 10000000)))
"Elapsed time: 19681.165584 msecs"
49999995000000
Enter fullscreen mode Exit fullscreen mode

And there's our bug: we have done nothing to the existing code. We have not changed our log4j2.xml file. We haven't adjusted anything except added a jar to our classpath but now our simulated "work" (adding up a few numbers) just took over 19 seconds instead of half of a second.

Diagnosing

Since we have criterium on our classpath, we can benchmark from each repl.

user=> (require '[clojure.tools.logging :as log]
                '[clojure.tools.logging.impl :as log.impl]
                '[criterium.core :refer [bench]])
nil
user=> (bench (log/trace "no-op"))
Enter fullscreen mode Exit fullscreen mode

This will print out some stats but the numbers we care about are the second line

log4j2-uber:   21.209608 ns
with-hive.jar:  1.911062 µs (1911 ns!)
Enter fullscreen mode Exit fullscreen mode

There is our two orders of magnitude. And one more clue as to what is going on can be found by checking the clojure.tools.logging/*logger-factory* for each repl:

;; log4j2-uber.jar:
user=> (log.impl/name log/*logger-factory*)
"org.apache.logging.log4j"

;; with-hive.jar
user=> (log.impl/name log/*logger-factory*)
"org.slf4j"
Enter fullscreen mode Exit fullscreen mode

Explanation

The de facto logging facade for Clojure is clojure.tools.logging. It tries to present a unified interface to interacting with the java logging ecosystem at large. It does this by knowing how to ensure (log/trace "no-op") get matched up to the trace classes and logger for your particular logger.

It accomplishes this by checking for the presence of classes which indicate that particular logger is on the classpath, and uses the first one it fines.

(defn find-factory
  "Returns the first non-nil value from slf4j-factory, cl-factory,
   log4j2-factory, log4j-factory, and jul-factory."
  []
  (or (slf4j-factory)
      (cl-factory)
      (log4j2-factory)
      (log4j-factory)
      (jul-factory)
      (throw ; this should never happen in 1.5+
        (RuntimeException.
          "Valid logging implementation could not be found."))))
Enter fullscreen mode Exit fullscreen mode

We were previously falling through to the (log4j2-factory) but jar accretion now means that the first (slf4j-factory) is registered as our logger factory.

Fixing

The readme has a nice section about choosing a logging implementation. I have learned my lesson that this is way more important than I had originally thought. This was an afterthought at best, and usually completely ignored. But leaving it alone can leave you at the mercy of some very serious performance penalties.

From the README:

The above approach is problematic given that applications often inadvertently pull in multiple logging implementations as transitive dependencies. As such, it is strongly advised that you set the system property.

$ java -Dclojure.tools.logging.factory=clojure.tools.logging.impl/log4j2-factory -cp with-hive.jar:criterium.jar clojure.main
Clojure 1.11.1
user=> (require '[clojure.tools.logging :as log])
nil
user=> (time (reduce (fn [acc x] (log/info "accumulating") (+ acc x)) 0 (range 10000000)))
"Elapsed time: 373.54025 msecs"
49999995000000
Enter fullscreen mode Exit fullscreen mode

So with the same with-hive.jar, if we set our logger implementation we are back down to 373 ms instead of 19 seconds.

Lessons:

I hope this is helpful for people. It is a bit of a contrived example, but I hope a germ of real world experience is present. The tight loop in here makes it quite obvious that something has gone wrong. But in real life, these performance lapses might be spread out in time in your code, never adding enough microseconds enough for you to feel, but always sapping your throughput.

I also think that this scenario isn't as far-fetched to encounter in real life. When's the last time you audited your deps to see how many loggers you actually have? Have you explicitly set your logger-factory for your uberjar? The answer for me was "it's been a while" and "no".

A good guess is to consider what you think your logger is and then evaluate the form (log.impl/name log/*logger-factory*) and see if you got it right. Check clj -Stree | grep log in this sample repo. It includes our log4j2, slf4j, logback, commons-logging, and I think log4j. It's easy to accumulate these.

Postscript

There's actually an extra wrinkle to this, but it isn't necessary to get the general gist of this. But this bit us at work and it hit us when we didn't add a new logger to classpath but silenced a warning.

We were seeing a logged message on startup:

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

Some googling showed us that this warning pops up when you have a multi-release jar but you don't have the Multi-Release: true properties in your manifest file. So we made sure our manifest was correctly added and forgot about it. Then we had issues come in reporting queries had gotten much slower. It had been a while, and it took a while for a colleague to track the regression down to finding loggers when logging and we confusedly reverted the change.

You can verify this for yourself by editing with-hive.jar's manifest to remove the multi-release. Remembering that the simulated work took 19 seconds with the slf4j logger

$ java -cp with-hive-not-multirelease.jar:criterium.jar clojure.main
Clojure 1.11.1
user=> (require '[clojure.tools.logging :as log])
nil
user=> (time (reduce (fn [acc x] (log/info "accumulating") (+ acc x)) 0 (range 10000000)))
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
"Elapsed time: 3615.623583 msecs"
49999995000000
Enter fullscreen mode Exit fullscreen mode

It's 3.6 seconds (instead of the 0.3 seconds we know is possible). But this is that performance penalty that can easily be smeared around your codebase.

Post Post Script

Often it's nice to let uberjars go off and do their thing rather start them up with clojure.main. Socket repls make it incredibly nice to be able to repl into an uberjar while it uses its main entrypoint to serve web requests or whatever it does. If your profiling needs some setup this is often a great way to ensure that it initializes the way it needs and you have a repl right into a good working state.

Clojure allows for some system properties to easily start up a repl, but it can be a little verbose. The following is a handy shell function I use constantly:

# huge props to p-himik on the Clojurians slack for this
# sample usage: clj -J"$(socket-repl 6000)" ... or java "$(socket-repl 6000) -jar ...
socket-repl()
{
 echo -n "-Dclojure.server.repl={:port $1 :accept clojure.core.server/repl}"
}
Enter fullscreen mode Exit fullscreen mode

Then you can easily startup repls like the following:

$ java "$(socket-repl 6000)" -cp log4j2-uber.jar:criterium.jar clojure.main
Clojure 1.11.1
user=>

## or with `clj` or `clojure` command line tools:
$ clj -J"$(socket-repl 6009)" -A:build
Enter fullscreen mode Exit fullscreen mode

Then just netcat localhost <port> and you have a repl into any running jar.

A really nice dev experience is if your editor can connect to socket repls. Doing this makes running a repl from a jar or from your regular "dev" setup almost indistinguishable. I can easily repl from a prod jar as from a branch in Clojure and my workflow is exactly the same because I use inf-clojure

Top comments (0)