DEV Community: Matthew Gale

Application and Webserver Logging in Spring Boot 3.1

Matthew Gale — Fri, 08 Sep 2023 18:15:35 +0000

Whenever I start a new Spring Boot project, I have to relearn how logging works to configure it. After so many years and new versions of frameworks, I decided to go deep and break down logging in Spring boot in the modern age and tease apart the various logging libraries and configurations to set the record straight.

In a run-of-the-mill Spring boot app, you'd like:

Application logs for Spring Boot 3.1 (using the default Logback via SLF4J)
Webserver access logs (using the default embedded Tomcat v10)
All logs going to the same sink (locally, that’s STDOUT)

The minimal configuration you need:

Dependencies

In your pom.xml:

<dependencies>
    <dependency>
        <groupId>ch.qos.logback</groupId>
        <artifactId>logback-access</artifactId>
        <version>1.4.11</version>
    </dependency>
    <dependency>
        <groupId>ch.qos.logback</groupId>
        <artifactId>logback-core</artifactId>
        <version>1.4.11</version>
    </dependency>
</dependencies>

A Configuration File

In src/main/resources/ create a logback-access.xml.

This example is very close to the default pattern from Spring Boot:

<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="ch.qos.logback.access.PatternLayoutEncoder">
            <pattern>%t{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}  INFO %-5.5(0) --- [%15.15I] %-40.40(org.apache.tomcat) : %requestURL %statusCode</pattern>
        </encoder>
    </appender>

    <appender-ref ref="STDOUT" />
</configuration>

A Bean

@Configuration
public class AppConfig {

    @Bean
    public TomcatServletWebServerFactory tomcatServletWebServerFactory() {
        TomcatServletWebServerFactory tomcatServletWebServerFactory = new TomcatServletWebServerFactory();
        LogbackValve logbackValve = new LogbackValve();
        logbackValve.setFilename("logback-access.xml");
        tomcatServletWebServerFactory.addContextValves(logbackValve);
        return tomcatServletWebServerFactory;
    }
}

With this together, Tomcat access logs will log alongside your application.

But what even is all of this? Why is this so cumbersome?

To refresh, java has a TON of logging frameworks. To name a few popular ones:

java.util.logging (aka JUL)
Log4j
Logback
Log4j2

Which one you pick is up to you, what’s important here is how Spring Boot detects what you want and uses it.

SLF4J (Simple Logging Facade) is a popular abstraction library over the most used logging frameworks- it allows you (or in this case, Spring) to interact with a facade instead of the underlying logging library, so you can pick whichever logging library appeals to you at no additional development effort.

However: Log4j2 has diverged from the contract SLF4J expects, and without an adapter SLF4J can’t service Log4j2.

A small detail to mention: the authors of SLF4J are also the authors of Logback, and so the usage of them together is often assumed, unless otherwise stated.

Logging in Spring

To make configuration easier, Spring has created a library called spring-jcl (Jakarta Commons Logging) that searches your classpath and attempts to automatically configure your logging based on what it finds. spring-jcl has a preferred order to which logging frameworks it will select depending on what it finds, but the Spring Boot starter is set up to use Logback via SLF4J. You can use any logger you’d like though, you just have to configure it.

If you use lombok, you can use its logging annotations to choose which logger you’d like use in any given class. For example, in the default config for Spring Boot, you can use @Slf4j to use Logback. If you’re configured for Log4j via SLF4J, you could use @Slf4j or use @Log4j if you’d like to use the bare logger.

As an aside, recall that because SLF4J and Logback are often synonymous, there’s no need for a @Logback helper from lombok- @Slf4j would usually use Logback anyway!

Logging in Tomcat

Because we embed Tomcat in our applications now, it’s easy to forget that Tomcat is a web server with a servlet container, so it’s actions and concerns are very different than our application code.

For example, if you attempt to visit an unknown route (resulting in a 404), that request never makes it to the application container. Tomcat is aware of the known routes, and rejects the request before it makes it to your application. That 404 is recorded in Tomcat’s access logs, which by default go to a file- so if we want access logs alongside our application logs, we need to configure Tomcat to use a different appender, which means informing it about which logging framework we’re using so it can direct its logs there.

Tomcat is configurable in that during initialization, you can pass it a logging pipeline (a series of org.apache.catalina.Valve s) to handle your logs, and in our case, direct the flow of logs to an appender we configure and control.

Sticking with the default Logback for Spring Boot, a project exists from the Logback team called ogback-access that comes with a valve (ch.qos.logback.access.tomcat.LogbackValve) we can use to direct the log stream. A quirk to be aware of is that while logback-access might have many classnames identical to ones in logback-core, their behaviour is different and they are not interchangeable: logback-access is more HTTP specific. The documentation for logback-access is here.

We’re not all DBAs: Indexes For Developers

Matthew Gale — Thu, 24 Oct 2019 16:14:52 +0000

We know they speed up queries, but what’s going on under the hood? How do they work?

An index is a structure (commonly a B-Tree, but not required) that we attach to a table that keeps certain columns of that table organized and in memory. Indexes are a solution to the age old problem that going to disk is slow- by caching data in memory you save yourself time reading a records from disk that will mostly be discarded. It’s more efficient to keep common queryable columns in a searchable in-memory store with a reference to where on disk the rest of that row can be found. Having an indexed column lets you find what you need quickly and go to disk specifically for what you need.

There are lots of explanations of B-Trees- Markus Winand has a beautiful explanation. I also give a hearty shout out to Markus in general, his site and book are full of great content and his explanations are amazing- highly recommended.

As a developer needing to work with and manipulate databases constantly, there’s a few useful points on indexes that tend to be forgotten. Let’s consider a few.

An indexed column’s values don’t have to be unique. Cardinality, in database jargon, refers to “how unique” the values in an index are; high cardinality means the underlying values have little repetition. When evaluating a column to decide if we want to put an index on it, high cardinality (low repetition) is good for selectivity, but a column doesn’t need to have perfect cardinality to be a candidate. Indexes are able to handle non-unique values by scanning sequentially through the leaves of the underlying tree. The linkage between tree leaves helps prevent unnecessary operations stemming from jumping around through the internals of the tree, making it less costly to navigate through the entries in the index. Scanning data in this way makes doing an equality check (val = ‘matt’) with multiple results a more lightweight operation. We can leverage this structure to scan through B-Tree indexes too- things like range scans (val > X and val <= Y<) and even some regexes (first_name like ‘matt%’). Careful though- even when you’re reading from an index in a range, there can still be a lot of data to read, making your query slow.

When columns are unique, during creation of an index, we can add the constraint that the index be unique, and that allows the optimizer leverage the uniqueness for lookup performance.

Multi-column indexes. Firstly, if you didn’t know you could do this- now you do! The underlying behavior and implementation of these indexes vary by DBMS, but across the board, multi-column indexes are versatile in that they can be used to query across all or a subset of columns in the index. To greatly simplify (this varies by DBMS), you can think of think of a table with 3 columns: col1, col2 and col3. When the multi column index is created, the values of all 3 columns are combined into a three part “value”: col1|col2|col3 and sorted- when a query is done, we traverse the tree by comparing against each “section” of the value one at a time and navigate the tree that way. With this arrangement, by supplying 3 values, we can traverse the tree as quickly as we would with a single column index, but giving us greater selectivity since querying by just one column might yield a ton of data to be returned. I mentioned that multi-column indexes support subsets of columns- queries can make use of a multi-column index if we query by only col1, col1 and col2, or all three- each column we add to the query increases the effectiveness of the index lookup because we are adding selectivity and so, less rows to pull back from disk. In some schema designs, multi-column indexes can be a performance boost because we’re able to eliminate so many rows by being selective in our seeking.
Index only scans. With multi-column indexes, it’s possible that the data you’re looking for are held by an index and, therefore, in memory. With the data so available, there’s no need to go to disk if the index can simply give us the data we need. Say you had a table with columns first_name, last_name and age in an index and you wanted to find the age for people named “Matt Gale”. Your query could look like select age from person where first_name = ‘Matt’ and last_name = ‘Gale’. In this case there would be no need to go to disk because age is part of the index, so the optimizer just returns age values directly from the index. This can be a really nice resource saver if you can exploit it.

In my experience building applications, Object Relational Mappings (ORMs) tend to grab too many columns during a query and developers don’t give much thought to lookups by more than just a single value and so favour single column indexes as a result. To be able to leverage index only scans, look into lazy loading more of your columns to see if you can squeeze some performance out of your queries.

Something to always keep in mind when doing any query against a database is how much data will I get back from this query? Queries can be slow because of lack of proper indexes, but also because the volume of disk accesses required.

For example, let’s say you have a large table with an index on a boolean column (very low cardinality). If you tried to search for rows matching ‘false’ your time spent doing disk access is going to be huge (you’re returning half the table!) relative to the tiny amount of time spent looking values up in an index. This is an example of a situation where indexing is not what will save you time, you need to be more selective in the rows you want to get back. Indexes are an important part of database performance but they are not the only thing to consider.

I hope this gives a bit of insight! We don’t all have to be DBAs to write sufficiently fast queries and we shouldn’t need to be. As developers, getting familiar with the core structures of a database is a sufficiently pragmatic way to spot and improve performance. With that, go forth and write fast queries!

SQL In The Real World: Setting Expectations For Your First Job

Matthew Gale — Fri, 11 Oct 2019 15:48:26 +0000

I’ve noticed an influx of fresh grads from bootcamps and universities who are nervous about SQL “in the real world”. New developers fear they don’t know or understand SQL well enough to get through an interview, let alone perform well in the job. To a newcomer, SQL feels vast and illogical compared to other programming they’re used to. They know SQL is important, but it isn’t emphasized in coursework or training to a point where they feel confident in being evaluated.

I understand how you feel and I’m here to help. I’ve interviewed dozens of developers fresh out of school over the years. I’ve worked as an enterprise backend developer for my entire career and I know the profile I want fresh developers coming into my team to have on day one. I want to set some reasonable expectations for you for what you should know walking into your first set of interviews, and your first days on the job.

No one expects you to know everything. And an entry level interview shouldn’t try blow your mind with all the things that you don’t know. You’ll be evaluated on practical, applicable knowledge. Here are the 4 questions I ask to the juniors who land in my lap with a bit of explanation as to why I value them.

Given a sample schema, write a basic SQL query.

Show me you can use a JOIN and a WHERE clause- being able to write a straightforward SQL query using SELECT, FROM, and WHERE to get a look at the data. I typically ask you to join one table to another, using maybe one or more join types- so you should know the differences between them:

join (aka inner join)
left join (aka left outer join)
right join (aka right outer join)
full join (aka full outer join)
cross join

If you’re still joins, there are some great, well travelled articles and diagrams on join types from Jeff Atwood or this nice explanation from Julia Evans, read up on them and practise a few problems on HackerRank and LeetCode. You want to be at a point where if you saw that a left join was something you wanted to do, you’d understand the syntax and the usage to make that happen.

Bonus points! What are indexes and how do they work? I want to be clear that this is only bonus. Not knowing about indexes is not something I would ever eliminate a candidate for, but they will definitely win you respect in an interview if you understand their role and can apply them. Mentioning query performance and index usage when formulating a queries in interview questions definitely shows you’re thinking of one of the biggest practical aspects of databases

Model me a “many to many” relationship.

I would never tell you to jump through a hoop if I wasn’t so positive you’d be asked this question. It’s a popular question for a reason- it demonstrates to me that you can think relationally, and have a little bit of design sense when it comes to crafting and making changes to a DB schema. This is a very common problem, and there is a common solution all developers working with a relational database should know.

What is the difference between = NULL and IS NULL?

This is one of those general usability aspects of SQL that all developers should have in their tool belt. Now that I’ve drawn your attention to it, learn it and have it at your disposal to know that it needs to be applied.

What’s a prepared statement and why do we use them?

I don’t expect an extremely technical answer here in terms of what happens at the database level, but one term I would expect to hear from you is “SQL injection”. It’s important to know that we shouldn’t just do string substitution for parameters in queries or bare string queries against a database- we need to protect ourselves from malicious input.

That’s it! Hopefully you feel a little less daunted.

All-in-all, don’t live and die by the coding interview prep sites- you can spend weeks on HackerRank and LeetCode going through problems of increasing difficulty and it can turn into a rabbit hole of “how much is enough”. Practising will make you stronger and I advise it, but also remember that in the workplace, schemas are not clean and contrived like they are on these practise sites- they’re flawed and imperfect. Designing and dealing with these beasts will be its own adventure that you won’t need to tackle under time pressure or with another developer evaluating you over your shoulder.

My advice is to get comfortable with the basics- when you have these, as an interviewer and coworker, I can work with you through most problems and we can have a discussion about them, which helps me understand you and vice versa to get to the end goal- you shipping code.