(Originally published on Medium, Image by Boskampi from Pixabay)
Introduction
When it comes to logging, there are multiple concerns, some of which I'll highlight for this post:
- Retention: to prevent server disk space from becoming scarce
- Search: the core use case of logs
- Formatting: aids in readability and search
Across different tools and platforms such as programming language and Operating System, these aspects vary significantly. One of the main points of OpenTelemetry is to have a central place to view all those logs, whether they come from:
- A file on the local machine, like with my MySQL server
- A file on a remote machine, like with my NGINX server
- A mobile device, like with ExTrack:
As mentioned in my earlier articles, my tools of choice for OpenTelemetry come from the Grafana stack, so I’ll be using Loki.
Loki
Loki’s main description is:
Like Prometheus, but for logs.
So that’s why I made this article to be after the Metrics article.
Loki mainly relies on labels in order to narrow down logs while searching, similar to Prometheus.

(service_name and level, 2 of the most common labels in my case)
It also supports structured metadata, which is similar to labels, but they’re not indexed, and so they can have high cardinality, such as an IP address or a user ID. The main idea when working with Loki is to filter by a few key indexed labels, and then narrow down with the various search operators.
The 2 main ways I push to Loki are:
- Scraping the raw log files with Grafana Promtail*, which then pushes them to Loki
- Using the OpenTelemetry Java Agent in combination with Grafana Alloy using OTLP
(*Promtail is at end-of-life, Alloy is the recommended alternative)
Promtail
At a basic level, Promtail uses a pipeline to process log lines into the final output to be sent to Loki. Pipeline consists of a wide variety of stages, which include, most notably in my case, the json stage, since I can use JSON for both the MySQL error log and NGINX access logs.
MySQL Promtail Config
- job_name: mysql_error_logs
static_configs:
- targets:
- localhost
labels:
service_name: mysql_error_logs
__path__: "C:/ProgramData/MySQL/MySQL Server 8.0/Data/LAPTOP-7H9JJDHB.err.00.json"
pipeline_stages:
- json:
expressions:
time: time
msg: msg
prio: prio
level_derived: prio
level: label
err_code: err_code
err_symbol: err_symbol
SQL_state: SQL_state
subsystem: subsystem
# https://dev.mysql.com/doc/refman/8.4/en/error-log-event-fields.html
- template:
source: level_derived
template: '{{ if eq .Value "0" }}System{{ else if eq .Value "1" }}Error{{ else if eq .Value "2" }}Warning{{ else }}Note{{ end }}'
- timestamp:
source: time
format: RFC3339
- labels:
level:
level_derived:
- structured_metadata:
prio:
err_code: err_code
err_symbol: err_symbol
SQL_state: SQL_state
subsystem: subsystem
- output:
source: msg
- I extract all the key-value pairs using the
jsonstage - The
timestampstage is there to ensure Promtail uses the time from the log statement instead of the time the file was last scraped - I chose to turn the
levelattribute into a label because log levels are usually the most important label, and also because the cardinality is quite low — 4.
Example
Equivalent logs in Loki:
NGINX Logs
- job_name: nginx_access_logs
static_configs:
- targets:
- localhost
labels:
service_name: nginx_access_logs
__path__: "/var/log/nginx/access-json.log"
pipeline_stages:
- json:
expressions:
timestamp: timestamp
remote_addr: remote_addr
message: message
status: status
request_method: request_method
hostname: hostname
trace_id: trace_id
span_id: span_id
server_name: server_name
- timestamp:
source: timestamp
format: RFC3339
- labels:
status:
request_method:
- structured_metadata:
remote_addr:
hostname:
trace_id:
span_id:
server_name:
- output:
source: message
Here, once again, I chose just 2 labels with a combined maximum cardinality of 4,500: 9 for HTTP methods times 500 for the range of status codes. Although practically it will be much smaller since not all of them are used.
And here is the custom NGINX log format:
log_format json escape=json '{"remote_addr":"$remote_addr","timestamp":"$time_iso8601","message":"$request","status":"$status","request_method":"$request_method","hostname":"$hostname","trace_id": "$otel_trace_id","span_id":"$otel_span_id", "user_agent": "$http_user_agent","server_name":"$server_name"}';
access_log /var/log/nginx/access-json.log json;
Java Agent
Backend
For the backend, the Java Agent automatically exports the Logback logs using OTLP and sends them to Grafana Alloy, where some resource attributes get indexed to labels by default. The relevant ones for me include deployment.environment.name(link) to differentiate dev, staging, and production; and service.name(link).
My configuration using environment variables for OpenTelemetry and Java:
export APP_VERSION=$(cat app-version)
export OTEL_SERVICE_NAME=expense_tracker_backend
export OTEL_JAVA_AGENT_LOCATION=/opt/opentelemetry
export JAVA_TOOL_OPTIONS="-javaagent:$OTEL_JAVA_AGENT_LOCATION/opentelemetry-javaagent.jar"
export SPRING_PROFILES_ACTIVE=dev
export OTEL_EXPORTER_OTLP_ENDPOINT=http://telemetry.davidgrath.com:4318
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment.name=dev,deployment.environment=dev,service.version=$APP_VERSION
java -jar "/opt/expense_tracker/expense-tracker-backend.jar"
The domain name is my local DNS entry that points to my Grafana Alloy instance:
Which is configured like this:
otelcol.exporter.otlphttp "loki_exporter" {
client {
endpoint = "http://localhost:3100/otlp"
}
}
otelcol.receiver.otlp "otlp_receiver" {
grpc {
}
http {
}
output {
logs = [otelcol.exporter.otlphttp.loki_exporter.input]
}
}
Example
Resulting Loki entry
Android
For the Android version, the JVM is fundamentally different, and so auto-instrumentation isn’t a thing. Manual instrumentation is needed and can be achieved with the Logback library
I’m using Dependency Injection with Dagger, so here’s how the OpenTelemetry object is configured in my Dagger Provides method:
@Provides
@Singleton
fun openTelemetry(buildConstants: BuildConstants): OpenTelemetry {
val logsUrl = "${buildConstants.telemetryHttpUrl()}/v1/logs"
val resource = Resource.builder()
.put(ServiceAttributes.SERVICE_NAME, "expense-tracker-android")
.put("android.os.api_level", Build.VERSION.SDK_INT.toString()) //Gotten from semconv docs
.put("deployment.environment.name", buildConstants.environmentName())
.build()
val sdk = OpenTelemetrySdk.builder()
.setPropagators(
ContextPropagators.create(
TextMapPropagator.composite(
W3CTraceContextPropagator.getInstance(),
W3CBaggagePropagator.getInstance()
)
)
)
.setLoggerProvider(
SdkLoggerProvider.builder().addResource(resource)
.addLogRecordProcessor(BatchLogRecordProcessor.builder(OtlpHttpLogRecordExporter.builder().setEndpoint(logsUrl).build()).build())
.build())
.buildAndRegisterGlobal()
OpenTelemetryAppender.install(sdk)
return sdk
}
Forgetting to append /v1/logs gave me a bit of trouble before I found out.
With the help of Mapped Diagnostics Context, I’m able to attach a randomly generated UUID to be able to filter my logs by deviceId:
Logging crashes
There are better tools for the job, including ACRA and Firebase Crashlytics. There’s even one within OpenTelemetry itself, but I wanted something basic that I could see directly in Loki.
By making use of the OpenTelemetry SDK and UncaughtExceptionHandler, I’m able to upload the log before the app shuts down:
val defaultUncaughtExceptionHandler = Thread.getDefaultUncaughtExceptionHandler()
val handler = object: UncaughtExceptionHandler {
override fun uncaughtException(t: Thread, e: Throwable) {
val processor = appComponent.logRecordProcessor()
LOGGER.error("Uncaught exception", e)
processor.forceFlush().join(200, TimeUnit.MILLISECONDS)
processor.logRecordExporter.flush().join(200, TimeUnit.MILLISECONDS)
//VERY IMPORTANT! Return the flow back to the system
defaultUncaughtExceptionHandler?.uncaughtException(t, e)
}
And I can see what happened in Loki:
This method feels a little hacky since I’m accessing the SDK directly, but I’ll work with it since I got my logs
Configuration summary
Conclusion
Now that my configuration is complete, I have a central place where I can search my logs across all my servers and services, while still letting SSH be a viable alternative.
In the next article, I’ll make use of distributed tracing to enable me to track the flow of execution from the app to the backend to the database using Tempo.
If you need any feedback, feel free to discuss in the comments
Thank you for your time.













Top comments (0)