DEV Community: Matt Sicker

StackOverflow: Debug Java annotation processors using Intellij and Maven

Matt Sicker — Sat, 11 Sep 2021 22:59:32 +0000

I've been working on a new annotation processor in Log4j in order to support the new plugin dependency injection system. Inspired by the configuration performance gains in frameworks like Micronaut, I hoped to emulate part of the idea by generating metadata at compile time that can be queried at runtime to determine which plugins and beans to bother loading. This can save a decent amount of startup time on reflection and class path scanning, and it allows for caching more metadata in the future.

However, annotation processors in Java are notoriously finicky. I had initially tried to debug the code using logging, but the complexity of our Maven build makes it difficult to properly surface those debug logs without reducing build performance. Eventually, I looked into how I might be able to just set a breakpoint in my annotation processor and experiment with some live data to figure out why the API wasn't returning metadata I'd normally expect from the equivalent reflection APIs. This fantastic explanation from StackOverflow covers how to set up debugging.

In the end, most of my confusion in the annotation processing API resulted from a misunderstanding of how the @Inherited annotation works. In particular, annotations on an interface are not inherited regardless of what you try. In fact, this annotation only applies when used on annotations placed on class definitions. As it turns out, projects like Spring have a lot of boilerplate to support its fancy meta-annotations and inheritance system.

answer re: Debug Java annotation processors using Intellij and Maven

Jul 11 '15

Here is the recipe.

Sidenote: I made it really detailed in some cases, skip the parts you already know how to do.

First of all, download and install Maven, then download and install IntelliJ IDEA (referred to as IDEA from here on). (If you don't know how to use…

Open Full Answer

An abridged guide to using ed25519 PGP keys with GnuPG and SSH

Matt Sicker — Sun, 09 May 2021 16:00:00 +0000

I came across a great guide to using a YubiKey with SSH and GPG a couple years ago which helped push me over the fence and acquire my own YubiKey. Following that setup guide, I set up my keys offline using a Tails Linux boot USB with a OneRNG hardware random number generator. While a fun exercise, I must note that it’s not for the faint of heart, especially if done on a recent MacBook Pro (with a touchbar) or incompatible hardware. However, it did help explain some of the features available in GnuPG, and this came in handy recently while exploring the new support for elliptic curve cryptography in YubiKey firmware 5.2.3, the version installed in my later YubiKey 5Ci purchase. While I originally created PGP keys using the same guide last year with RSA keys, since those keys were expiring soon, it seemed like a good idea to look in to switching to Curve25519 keys. GnuPG has added some improved support for this algorithm along with supporting this updated YubiKey firmware to transfer these keys to a YubiKey. In this brief guide, I’ll go over how to generate an appropriate PGP key that can be used both in a YubiKey and for use with SSH. For more general info about using smartcards with GnuPG, see this guide from GnuPG.

About PGP Keys

PGP keys are slightly more complicated than a single keypair, and they’re fairly flexible. A key has one or more user ids (name, email address, and an optional comment), one of which is flagged as the primary uid (by default, this is the original uid used on creation). Keys have an optional expiration date along with a list of usages allowed for that specific key. Keys have one or more subkeys which allow for different keys for different use cases all on the same key. These use cases include certification (to create and certify subkeys typically), signatures, encryption, and authentication. Each key or subkey can allow one or more of those capabilities, though certification is typically reserved for the primary key. When using RSA, this is fairly simple as RSA supports all four of those capabilities. However, elliptic curves typically use dual algorithms for signing and encryption, so this will be slightly more nuanced.

Generating PGP Keys

Our master key will be created for certification with no expiration date. This is justified by the fact that the certification key will be physically tied to a YubiKey for which we can use a revocation certificate if the key is lost or stolen. The subkeys will have expiration dates to allow for key rotation as explained in the linked guide, though that is out of scope for this post. We’ll split up the remaining usage capabilities into three subkeys: a signing subkey, an encryption subkey, and an authentication subkey. Note that certification and authentication keys use signature algorithms internally, thus for our key, we’ll use ed25519 for all but our encryption subkey which will instead use cv25519.

The following commands will help us avoid using the UI for most of the work. Use this gpg.conf file in ~/.gnupg/gpg.conf for a more secure default. Then, create a certification master key (note the optional comment can be omitted entirely, no parenthesis necessary) and specify a password for the key when prompted:

gpg --quick-generate-key \
    'Your Name <your.email@example.com> (optional comment)' \
    ed25519 cert never

Next, save the key fingerprint without spaces to an environment variable:

export KEYFP=123456789ABCDEF0...

Now add subkeys for signing, encryption, and authentication. These will have expiration times to allow for key rotation. For example, using a one year expiration time:

gpg --quick-add-key $KEYFP ed25519 sign 1y
gpg --quick-add-key $KEYFP cv25519 encr 1y
gpg --quick-add-key $KEYFP ed25519 auth 1y

Next, verify the keys have been added to your local keyring:

gpg -K

This should give output like:

sec ed25519/0x0123456789ABCDEF 2021-01-01 [C]
      Key fingerprint = 0000 1111 2222 3333 4444 5555 0123 4567 89AB CDEF
uid [ultimate] Matt Sicker <mattsicker@apache.org>
ssb ed25519/0x9876543212345678 2021-01-01 [S] [expires: 2022-01-01]
ssb cv25519/0x3141592653589793 2021-01-01 [E] [expires: 2022-01-01]
ssb ed25519/0x8888ABCDFDEC8765 2021-01-01 [A] [expires: 2022-01-01]

Using SSH

The authentication subkey can be used for authentication in SSH. There are a few different ways to export the PGP key into an OpenSSH format, and we’ll cover a simple one here. This method will use GnuPG as an SSH agent which allows us to both use PGP keys for SSH as well as to import and encrypt existing SSH key files into a GPG-managed SSH keyring. First, obtain a copy of this gpg-agent.conf file and save it to ~/.gnupg/gpg-agent.conf. Uncomment the appropriate pinentry program for your platform (and don’t forget to install it if you haven’t already). Next, add the following lines to your ~/.bashrc, ~/.zshrc, or whatever shell rc file you use:

export GPG_TTY="$(tty)"
export SSH_AUTH_SOCK=$(gpgconf --list-dirs agent-ssh-socket)
gpgconf --launch gpg-agent

Note that this configuration is only relevant for your local machine. If you’re using SSH agent forwarding, don’t copy this configuration to remote machines. After restarting your shell, run ssh-add -L to list the currently known SSH keys in GnuPG. Your locally stored keys should be listed here as well as your YubiKey if you’ve transferred the key there and have it plugged in. These lines correspond to SSH public keys which can be used in your ~/.ssh/authorized_keys file on relevant remote machines to SSH into. Alternatively, you can export a GPG’s authentication key into an SSH format directly using the following command:

gpg --export-ssh-key 0x1234ABCD1234ABCD

For use with GitHub and other git+ssh providers, add this public key to your account’s SSH keys.

Building a Deterministic Random Bit Generator

Matt Sicker — Sat, 13 Mar 2021 00:00:00 +0000

As part of developing the O(1) Cryptography library, I initially relied upon the standard Java cryptography APIs for cryptographic random bit generation. Unlike most of the cryptographic APIs in Java that are frequently misused, java.security.SecureRandom is about as simple as it gets for defining all the relevant operations of a secure random number generator. Naturally, like most of the standard APIs, this one, too, can be misused if configured incorrectly, most importantly in the underlying seed generation strategies which are platform-specific. Java provides a few different strategies out of the box, and in pure Java code, I rely on this API to seed the O(1) Cryptography deterministic random bit generators. In C, there are some nicer integrations with OS-specific system calls and even hardware-specific integrations more readily available, though Java can access some of these if they’re provided via some PKCS11 library. As a technical note, the only relevant limitation we have in Java compared to C is that the Java operations will generally require opening a file or accessing some resource which can potentially fail due to file descriptor leaks or other resource exhaustion, while the C code can make use of system calls that bypass the file interface entirely. Depending on the underlying threat model, this can be a vulnerability if the process cannot open /dev/random or equivalent device files. With that in mind, let’s take a look at how a random bit generator works.

The canonical U.S. standard for cryptographic random bit generators is NIST Special Publication 800-90A Revision 1 which specifies three mechanisms to do so using standard cryptographic primitives. These mechanisms include one based on plain hash functions, one based on keyed hash functions (HMAC specifically), and a third based on block ciphers such as AES. In O(1) Cryptography, I’ve implemented two strategies using the primitives available here: one based on a keyed BLAKE3 hash function, and another using ChaCha20 as a pseudo-block cipher. Both strategies have some commonality beyond their underlying permutations (BLAKE3 uses the same quarter-round mixing function as ChaCha20).

First, a DRBG instance is lazily initialized on a per-thread basis using system-specific seed entropy. Java makes this fairly simple with the ThreadLocal class and relying on SecureRandom for accessing system-specific entropy sources for generating a seed. C raises the challenge of not having a standard runtime, though it does have standardized thread-local storage support. On the other hand, C gives us access to lower level APIs which have their own advantages to the Java equivalents.

One of these APIs is the function getentropy which uses the Linux system call getrandom, a function that is used as the basis for /dev/random and /dev/urandom. On BSDs and macOS, getentropy is itself the system call with the same signature that libc borrowed. For older POSIX platforms that don’t expose a system call, reading seed data from /dev/random can be supported, though I’ve elided support for it currently as it involves file IO which starts to bloat the C library beyond what’s minimally necessary to support the Java API. On Windows, there appears to be a long history of APIs here, the most promising sufficiently low level one being RtlGenRandom, a function from the advapi32 library which has been a fairly standard base library on Windows since the XP days. An interesting source to look at would be non-standard hardware entropy sources like the OneRNG, an open source hardware RNG which is typically accessible in a platform-independent manner via serial port access APIs besides any of the integrations offered specifically for Linux.

Each implementation uses this seed data slightly differently, but they both use the seed as an initial key. The ChaCha20 implementation also uses the seed for a nonce and initial counter. In order to generate random bytes, this uses the keystream output (the resulting ciphertext of encrypting null bytes) for each request and then ratchets itself by using an incrementing nonce to provide forward secrecy. The BLAKE3 implementation generates bytes by finalizing the hash output of an empty input with the first 32 bytes being used solely to rekey the hash function as its ratchet after using the subsequent bytes as the output bytes. Both implementations maintain internal counters to track when reseeding is necessary.

It may be interesting to note how simply the concepts from the underlying stream cipher and extensible output function primitives made implementing a DRBG much simpler than the required steps to do the same with AES or HMAC-SHA2. Since O(1) Cryptography is an opinionated cryptography library with the goal of being easy to use and hard to misuse, this philosophy is apparent in both its APIs and its choice of cryptographic primitives. Using the current NIST standards, there is currently only one primitive available that can be as easily used: SHAKE128/SHAKE256. SHAKE is the extensible output function variant of SHA-3 which has a comparable API to BLAKE3 while being fairly slower. Using this same pattern, it’s fairly simple to build a DRBG using any cryptographic sponge function such as Xoodyak or any sponge-style permutation like Ascon as either cipher-based generators or keyed hash ones. Some of these ideas are also included, though it remains to be seen where lightweight cryptography standards converge, so they are only available as experimental implementations.

I hope this helps demystify how a cryptographic random bit generator can be made using cryptographic primitives. DRBGs can get more complicated than this by adding an interface to accept user-provided seed data to include in its seed input, maintaining buffers, and gathering various system state to use in the reseed algorithm, though I’ve tried to keep things simple by standing on the shoulders of operating systems and hardware support instead. Plus, Java’s standard SecureRandom implementation already handles this for us where necessary.

These random generators will be an important tool in our cryptographic toolbox later on when I go over the design of other parts of O(1) Cryptography in subsequent blog posts. Random data are integral to our ability to generate keys and challenges, and the strength of our cryptosystem is only as good as its fundamental parts. Ensuring that random data can be generated fast and in parallel is a clear requirement for any proper use of cryptography, so perhaps it may help to keep in mind one of the implicit design goals of O(1) Cryptography: speed. Performance problems are a common source of security vulnerabilities when security measures get disabled due to interference with core application logic. Using high performance cryptographic primitives prevents the need to tweak security parameters improperly; using primitives that avoid overly complex configuration options goes a step further by preventing insecure tweaks in the first place.

Introducing the O(1) Cryptography Project

Matt Sicker — Sun, 07 Mar 2021 00:00:00 +0000

Cryptography is a fascinating subject at the intersection of pure math and computer science that has become nearly ubiquitous over the past several years. Similar to functional programming, cryptography holds a particular interest to me because of the pure math connections, and my work in security software engineering has brought me back into the topic over the past year or so. Diving in to the standard Java cryptography APIs, I quickly discovered a tangle of strange naming conventions and poorly documented cryptographic primitives. There have been numerous academic papers written studying the widespread phenomenon of misuse of Java cryptography APIs, and after spending a bit of time with them, I can easily see why this is. Java, in its quest to remain low level and generic, oftentimes provides overly complicated APIs that, while flexible for low level use, offer very little guidance in their correct use. Some of this might be attributed to typical over-engineering common to Java APIs prior to Java 8, but this particular API is more like a C API ported directly to Java. In fact, there’s a fairly innocuous explanation for it: it’s the same basic pattern present in most historical cryptographic libraries, most of which are indeed written in C by academics with little care for engineering. Combined with the historical baggage of having to deal with cryptographic export controls back in the 1990’s and early 2000’s, the Java cryptography API, like most cryptographic software written before export controls were relaxed, suffered from security vulnerabilities by design.

Much has been written and discovered since these dark days, though most software still struggles with outdated cryptography practices and misuse. In an effort to help software developers incorporate strong cryptography into their applications, an effort must be made to create and use cryptographic software components that are both easy to use and hard to misuse. One of the longest running efforts with a similar philosophy is the OpenBSD operating system which has incorporated strong cryptography and security by default for the past 25 years. Unconstrained by American cryptographic export controls in its home of Canada, the project has been one of the great examples of pervasive use of cryptography and secure design throughout its codebase. With simplified export controls, particularly for free and open source software, this philosophy must expand and help improve the security, privacy, and safety of the software we all write.

Using a corpus of public domain algorithms and cryptographic knowledge, I’ve started the O(1) Cryptography Project where I’m developing an opinionated cryptography library that aims to be easy to use and hard to misuse. O(1) Cryptography is a Java and C library that bundles the latest best practices in cryptography by providing abstractions for common cryptographic use cases. The name is a pun on the idea that cryptographic algorithms should generally run in constant-time and with minimal differentiation in its observable state. These properties seem to be somewhat modern to cryptography in that computers are fast enough and powerful enough now to statistically differentiate non-constant-time cryptographic algorithmic output, though they weren’t as big a concern back at the turn of the 21st century when AES was standardized. In fact, most standardized cryptography until recently was developed almost entirely detached from the reality that one day, non-cryptographers might need to use this stuff, too. There simply aren’t enough cryptographers in existence to write custom cryptographic routines for every application that needs it, and older cryptographic libraries were not designed for non-experts. Maybe you’ve heard of the various acronyms from yesteryear like AES, DES, RC4, RSA, DSA, MD5, SHA1, and many more. Some of these primitives are still useful and secure, but they all have problems of one form or another. In particular, they all fail the two-prong test of being easy to use and hard to misuse.

For example, AES is hard to use: as it was standardized before the consensus formed that authenticated encryption was the way to go, it requires pairing with an authentication algorithm which is commonly forgotten in practice. Implementers of AES are not discouraged from writing various optimizations that leak information about the underlying encryption key. It is fairly easy to misuse AES on both the implementer side and the application side. Being a block cipher, in order to encrypt data longer than the length of the encryption key, a mode of operation must be specified, and many commonly implemented modes have their own security vulnerabilities besides a lack of authentication. Another example is RSA, the pair of signature and asymmetric encryption algorithms that are frequently misused and improperly implemented. Due to its large key size, temptations to cut corners have run rampant through history in many implementations. Misunderstandings in the use of symmetric versus asymmetric encryption have led to people using RSA to directly encrypt data, direct use of the Diffie-Hellman shared secret result for encryption, duplicate use of keys, and many other security failures.

Between 2005 and 2010, Prof. Daniel J. Bernstein at University of Illinois at Chicago published a few papers that form the basis for much of the underlying cryptographic primitives central to O(1) Cryptography. In particular, the Salsa20/ChaCha20 family of stream ciphers and extended nonce versions, the Poly1305 one-time authenticator, and the elliptic curve Curve25519, were all detailed during this time. A more detailed listing of these various foundational papers are listed in the O(1) wiki, though the common theme behind the choice of primitives for this library are that they, too, are designed with the philosophy that they should be easy to use and hard to misuse. Another interesting theme is that many of these algorithm choices have been included into various IETF standards such as TLS 1.3 and SSH, so their use has clearly become far more widespread than their initial years.

Naturally, I am not the first to develop such a library, and there is prior art that inspires this library. The general idea behind making a cryptographic library that is easy to use and hard to misuse is the central concept of the polyglot Themis cryptographic framework. The choice of algorithms featured have been strongly influnced by Prof. Bernstein’s old NaCl library which formed the basis for libsodium, the essential C library with a similar philosophy (or at least as easy to use as C can be given that it’s C). Special thanks to Frank Denis, the maintainer of libsodium and developer of much of the zig standard crypto library, for further widening the rabbit hole of cryptographic primitives to explore and support, particularly in the field of lightweight cryptography.

This library is still under heavy development, particularly in the area of documentation and the higher level APIs. Much of the primary cryptographic primitives have been ported to Java where needed, and native implementations are also available. While the primary concern is to develop the Java API first, due to the eventual inclusion of C code from the Fiat Cryptography project for formally verified elliptic curve functions, I am also considering what other languages make sense to provide facades for, especially non-JVM ones that would benefit more from the native code. Pure Java versions of the algorithms are all available, though optimized native versions are also provided as an option. Similar to Apache Log4j, it may make sense to create Scala and Kotlin APIs, but those are currently out of scope for initial release.

Exploring the ChaCha stream cipher

Matt Sicker — Sat, 20 Feb 2021 00:00:00 +0000

Stream ciphers form the basis for simpler encryption and decryption algorithms than traditional block ciphers like AES. In particular, the ChaCha family of stream ciphers form the basis of the encryption functionality in O(1) Cryptography which we’ll explore in more detail. Originally published as the Salsa family (PDF) of ciphers, ChaCha (PDF) makes some small modifications to Salsa for increased security while maintaining equivalent performance. ChaCha has since been widely standardized in various networking standards and programming language standard cryptography libraries as an alternative to AES and other ciphers. Recall that a stream cipher is an algorithm that takes a secret key and an input stream which returns an output stream of the same size of the input stream. Stream ciphers work by taking the secret key (and usually some sort of nonce or initial value which cannot be reused for the same key) and generating a stream of deterministic random bits called the keystream which is used for multiple purposes. The primary use of this keystream is to xor it with an input stream of plaintext or ciphertext to produce an output ciphertext or plaintext respectively. Given this mode of operation, compared to block ciphers which require complicated key scheduling algorithms, it can be hard to imagine why block ciphers have been so popular historically speaking. Surely a stream cipher isn’t that simple, is it?

Overview

ChaCha and Salsa are stream ciphers that expand a 256-bit key into 2⁶⁴ randomly accessible streams of 2⁶⁴ randomly accessible 64-byte (512-bit) blocks. They are parameterized by a round number suffix, recommended at 20 (as in ChaCha20 or Salsa20), but also available in 8 and 12 round variants with reduced security margin. This round number controls the number of times the round function is applied to the cipher’s internal state and must be an even number. Each round applies a sequence of constant-time operations on an array of 16 32-bit words consisting of four addition, xor, and constant-distance left shift and rotate operations each. The choice for these operations relies on the mechanical sympathy of how CPUs physically implement addition, xor, and shift/rotate instructions, all of which are both fast and operate in constant time regardless of input. This set of operations is also functionally complete, so despite seeming simple, they can simulate any other Boolean expression or logic gate which makes them sufficiently powerful.

Internal State

The internal state of a ChaCha cipher consists of a 512-bit block of data addressed as 16 little endian 32-bit unsigned integers. Keeping the entirety of the cipher state inside this buffer along with the operations performed on it allows CPUs to keep the state in its cache which helps maximize software performance while simultaneously preventing timing attacks based on memory access patterns common to optimized software implementations of AES. This state is initialized with a secret key, a constant value, and some input data formed from a nonce and initial counter integer. The constant value is what is known as a nothing up my sleeve number and is needed as part of the standard initialization of the cipher state, and in ChaCha, this constant is the 16 byte ASCII-encoded string “expand 32-byte k” which fills the first four little endian integer values of this state. A “nothing up my sleeve number” is an arbitrarily chosen initialization constant value that is used in a cipher where one is needed such that it’s clear that the choice of constant was arbitrary and is therefore unlikely to have intentional backdoors, a problem faced by the secrecy behind parameter choices in DES recommended by the NSA in the 1990s. Such numbers are usually encodings of mathematical constants or ASCII strings. The next eight integers consist of the 32-byte secret key interpreted as an array of eight little endian 32-bit integers key0, …, key7. Finally, the last four integers consist of the initial counter and nonce encoded similarly. This last aspect has three main variants on how to encode the input data: a 64-bit initial counter with a 64-bit nonce; a 32-bit initial counter with a 96-bit nonce; or a 128-bit nonce. These correspond to the original ChaCha cipher, the IETF standardized variant, and HChaCha respectively, the latter being used in the XChaCha variant of ChaCha which uses an extended nonce. Laying this out in a grid, the internal state looks like this:

[0x61707865, 0x3320646E, 0x79622D32, 0x6B206574,
key0, key1, key2, key3,
key4, key5, key6, key7,
input0, input1, input2, input3]

These bottom input values correspond to attacker-controlled input which are positioned to reduce the flexibility attackers have in cryptanalysis of this cipher compared to Salsa which organizes the initial state in a different configuration with attacker-controlled input values in cells 6 through 9. This state is updated by a sequence of invertible operations defined by applying a round function the round number of times.

Round Function

ChaCha defines its round function using a smaller sub-operation known as the quarter-round which is applied four times per round using a specified permutation. This quarter-round is responsible for the entirety of the underlying bit-shuffling taking place to produce a keystream and is defined using the following algorithm.

void quarterRound(int[] state, int a, int b, int c, int d) {
    state[a] += state[b];
    state[d] = Integer.rotateLeft(state[d] ^ state[a], 16);

    state[c] += state[d];
    state[b] = Integer.rotateLeft(state[b] ^ state[c], 12);

    state[a] += state[b];
    state[d] = Integer.rotateLeft(state[d] ^ state[a], 8);

    state[c] += state[d];
    state[b] = Integer.rotateLeft(state[b] ^ state[c], 7);
}

Finally, a round consists of a column round or a diagonal round in alternating sequence. A column round consists of a quarter-round applied to each of the four columns. A diagonal round consists of a quarter-round applied to four diagonal permutations of the state.

void columnRound(int[] state) {
    quarterRound(state, 0, 4, 8, 12);
    quarterRound(state, 1, 5, 9, 13);
    quarterRound(state, 2, 6, 10, 14);
    quarterRound(state, 3, 7, 11, 15);
}

void diagonalRound(int[] state) {
    quarterRound(state, 0, 5, 10, 15);
    quarterRound(state, 1, 6, 11, 12);
    quarterRound(state, 2, 7, 8, 13);
    quarterRound(state, 3, 4, 9, 14);
}

ChaCha20 consists of 20 rounds, and this block shows the application of two subsequent rounds. Therefore, a complete application of ChaCha20 will apply the above two rounds 10 times to produce a 64-byte block in the keystream.

void permute(int[] state) {
    for (int i = 0; i < 10; i++) {
        columnRound(state);
        diagonalRound(state);
    }
}

At the end of each key block, we interpret cells 12 and 13 as a little endian 64-bit counter which is incremented in place before generating the next 64-byte keystream block.

void incrementCounter(int[] state) {
    if (++state[12] == 0) {
        ++state[13];
    }
}

Combined with functions to decode and encode bytes to and from arrays of integers, data streams and keystreams can be fairly easily combined. These ChaCha functions are also used for deriving subkeys in the case of ChaCha20-Poly1305 authenticated encryption and for deriving subkeys and sub-nonce data for implementing XChaCha20-Poly1305. Using the keystream output from ChaCha can be used to implement a deterministic random bit generator. When combined with a message authentication function like Poly1305, ChaCha can form the basis for an authenticated encryption with authenticated data algorithm as standardized in RFC 8439. The ultimate advantage to using a cipher like ChaCha that permits efficient secure software implementations is that it can be widely used by less powerful devices or devices lacking intrinsic AES-related instructions for efficient secure hardware implementations. Many existing software AES implementations are vulnerable to a considerable number of attacks depending on the threat model, and proper software implementations without appropriate operating system or hardware support may suffer performance problems leading to deployment of insecure code in practice. Several of these attacks are detailed in Cache-timing attacks on AES (PDF) along with advice on how to mitigate this in software, though one of the key conclusions you might come to is that AES is something to be avoided where possible.

In a future post, we’ll go over how Poly1305 works, combine it with ChaCha, and ultimately define the authenticated encryption with authenticated data (AEAD) algorithm XChaCha20-Poly1305 used in O(1) Cryptography. In the meantime, you can have a look at some of the academic papers and references cited in O(1) Cryptography.

The Art of Logging

Matt Sicker — Mon, 06 Nov 2017 14:00:00 +0000

All developers have attempted to debug their programs by printing lifecycle and state information to the console. This concept, sometimes known as printf debugging, can be far more powerful of a tool than one might first expect. The essence of this debugging technique is the concept of logging, where developers add relevant information about the state of the running program to a log. The use of logging is vital to both developers and operators, and it is important to understand how and why to use logging from both perspectives.

Logging Fundamentals

Logging is far more than just printing to stderr, however. Typical logging systems are divided into a set of logging levels that generally define the audience and semantics of the log event. For example, in Apache Log4j 2, logging levels are divided into the following set:

Fatal: error messages that indicate that some subsystem or the entire program cannot continue execution and will terminate.
Error: error messages regarding a problem that should be handled by a human. These are generally useful for operators to alert on.
Warn: warning messages regarding potential problems that may need to be handled by a human. This level is often misused and ignored as a result.
Info: informative messages about the state of a program. These types of messages tend to be related to the lifecycle of a program and can be viewed as a way to debug the macro state of the program.
Debug: debugging information about internal states of the program. These messages are usually only helpful to the developers maintaining a program.
Trace: messages tracing the execution flow of a program. These messages are usually very low level and simply mirror the micro state of a program and generally don’t offer more information than a debugger would.

Some logging systems define other levels, but most logging systems categorize their log messages into similar buckets with similar use cases. Each level can be selectively enabled or disabled, though generally disabling one level will disable all levels below it as well. For example, if we used a logging configuration that was set to WARN as its level, then only warnings, errors, and fatal messages would be shown.

By simply adding severity information to log messages, we have already surpassed the functionality offered by printf, but we’ve only scratched the surface. Any given program is generally large enough to be made up of some sort of concept of modules or subsystems, so it seems like it would be useful to extend this configurable flexibility to subsystems as well. In Java programs, these tend to be separated by packages and classes, though the important concept to use here is that of a named logger. By naming the loggers used in a program, each subsystem can be independently configured to only output logs that are desired. For example, suppose a third party library is misusing the warning level and causing operations to be concerned about the health of your application. After verifying that every warning log message under the logger name prefixcom.example.subsystem are not real warnings, we can use a higher level threshold for that set of loggers specifically while not having to disable warnings globally or modify the third party library’s source code. This also relates to the idea that logger names form a hierarchy; com.example is the parent ofcom.example.subsystem. This allows for simpler ways to configure entire subsystems in one setting.

At this point, we have a rather powerful abstraction over log event filtering using both a level and a name, but we can do better! An additional piece of metadata can be attached to log events: markers. A marker is a simple text string to mark some sort of cross-cutting concern of a particular log message. This can be used to help route specific log messages to different logging systems. For example, suppose a log message is marked with the ALERT marker. The logging configuration could have a filter for that marker which would route these messages regardless of level or logger name to a particular destination. This might be an alerts channel in Slack or an alerts mailing list.

In some programming languages such as Java, string manipulation is considered a somewhat low level operation, thus there are certain string templating features not present here that would be useful for logging. For example, logging a message that contains values from some local variables would normally require string concatenation, and if that log message is never displayed, then said concatenation was wasted CPU effort. Little things like this can add up over time to form a significant performance overhead, so we can certainly do better! Enter the parameterized log message which is quite similar to a parameterized SQL query in spirit. In Log4j and many other logging systems, parameters are specified by {} placeholders in the log message and provided as additional parameters to the logging method. For example:

logger.debug("User {} logged in", user.getName());

The placeholder is filled in only if debug logging is enabled, so the full string is never computed unless absolutely necessary. This technique is mostly relevant to languages like Java. In the Scala version of Log4j, for example, string templates are a built in feature to the language, and macros are used behind the scenes to avoid the template rendering when logging is disabled. Example:

logger.debug(s"User ${user.getName} logged in")

One more related API that is handy to know is that a lambda function can be provided instead of a string in order to defer some code needed to assemble a log message only when enabled. For example, suppose we wish to go fetch some additional metadata from a database for some debug log message. This overhead might be unacceptable most of the time, but we may wish to selectively enable it once in a while. The entire body of the function can be encapsulated into a lambda function and passed to the logger. This is generally cleaner than surrounding the code with if checks for the relevant log level or other noisy techniques.

There are far more features that can be covered regarding how to use a logging API from the developer’s point of view, but these are mostly convenience features regarding repetitive things like thread-local information always included in a log message, or structured log messages, generic event logging, and others. Far more information about these features are available in the Log4j manual.

Where Do Log Events Go?

Now that we’ve established a general framework for writing and filtering log events, what can we do with them? The simplest implementation of handling log events would be to print each log message to the console separated by new lines. Since quite a bit of context would be lost doing it this way, we generally include additional information from the log event such as the timestamp, log level, marker (if defined), logger name, and thread name for multithreaded programs. All the fields we wish to output should be configurable, and in fact, there are several different fields available which we can add to provide context about the log message. The output format could also use a structured format such as JSON which is more easily parsed than line-oriented log messages, though all log aggregation and search tools have powerful tools to extract log event information from all sorts of formats.

In some use cases, writing log events to stderr is acceptable. For example, during development of a program, the developer may wish to view log events while running the program in the console. On the other hand, perhaps you’re using an orchestration framework such as Apache Mesos to execute all your applications. Such a framework can be configured to watch the stderr streams of all running applications in order to collect log messages to a central location.

For many use cases, however, simply printing to a console that nobody looks at is not a valid strategy. Any website that requires a reasonable SLA and has more than, say, a few hundred users, generally requires multiple servers to distribute load. As the number of nodes increase, it simply becomes infeasible to watch the console. In fact, each node may be executing multiple applications, so without a program like tmux, we’d have to redirect thestderr of each program to a file anyways. With that in mind, we can directly configure the logging framework to output log events to a file instead of stderr. Each file can be monitored using a program such as tail to continually watch for new log events being appended to the file. This style of logging is pervasive in typical GNU/Linux and BSD systems where many running services will output log information to /var/log/ directories. However, if this is not configured to periodically rotate log files and delete old ones, then the server’s disk space can eventually fill up with log information! This job is typically filled by a program such aslogrotate, though Log4j has a rolling file appender which provides similar functionality.

Simply outputting to a log file can be a good strategy for operators who are still stuck in the “do it by hand” mindset, but we can do better! Our main goal here should be to collect logs from all our servers into a central, searchable location. One such way to accomplish this is by using a product such as ELK,Fluentd, or Graylog. These tools offer more than just log aggregation; they offer ways to filter, sort, search, and alert based on the contents of the logs. However, by relying on log files, we’re also relying on the stability of the individual servers. Obtaining logs in disaster scenarios is generally more difficult but also far more important, so let’s improve on that.

Apache Flume is a project for collecting and aggregating large volumes of log events in a distributed computing environment. This is very useful in cluster scenarios such as running dozens or hundreds of Apache Hadoop or Apache Spark nodes for example. Individual nodes can pass along log events to a Flume agent, and each agent is responsible for reliably delivering the log events elsewhere. Combined with the Flume appender, this can be easily utilized in a distributed environment to collect all log events to a central log aggregator. Said aggregator may be something complex like Logstash or Graylog, or perhaps it may be something simple like a single master log file.

Now that we have our logs all in one place, we can really step up the operations game. We can set up alerts based on log level thresholds, number of messages, frequency of messages, and triggers based on any metadata contained within. If we want to get really fancy, we can train some machine learning models via Sparkor Apache Mahout combined with any other exported metrics data to attempt to predict failure of our services. Such a technique could also be used for all sorts of observability of clusters and microservices. Combined with scripts to automatically scale or restart services, operations can become more proactive in maintaining their systems.

There are dozens more frameworks, libraries, and tools that could be covered here. Logging is something all developers do whether they’re using the proper tools or not, so it’s a great idea to get familiar with the tools and concepts in order to improve the metadata being created by applications. Developers should work closely with operators (devops) in order to find a good balance of logging verbosity and observability. Managing logs is a complex topic that many people tend to overlook, but having a good logging architecture in place can help save the day during a production issue. As a final note, to those using the Java Platform, Apache Log4j 2 is the premier logging library for Java, Scala, Groovy, Kotlin, and any other JVM language. It is common for logging to add noticeable overhead to applications, and the typical solution is to simply disable logging, but this removes all the advantages to logging in the first place! Instead, take a look at the numbers and see how Log4j can be used with very minimal overhead, even in high frequency trading applications.