Framing The Problem
I am working on a project and we wanted to be able to trace all the logs that came from a user calling one of our endpoints. This entailed an easy way to link together logs if they were done by the same user calling an endpoint and see the name and arguments of the function where the log originated from. This will allow us to reproduce any bugs and trace users' journeys.
To demonstrate what we wanted to achieve suppose we had the flask app:
from flask import Flask
import logging
app = Flask(__name__)
logging.basicConfig(format="%(asctime)s | %(levelname)s | %(filename)s | %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S%Z", level=logging.DEBUG)
book_db = {
0: "book_1",
1: "book_2"
}
def get_book(id):
logging.info(f"Getting book with id {id}")
return book_db[id]
@app.route("/books/<int:book_id>", methods=["GET"])
def books(book_id):
logging.info(f"Getting book")
book = get_book(book_id)
logging.info(f"Got book: {book}")
return {"books": [book]}
Then if we called the endpoint http://127.0.0.1:5000/books/1
we would like to see in our logs:
2022-03-06T14:34:16GMT | INFO | app.py | 16d99360-90ad-4e1a-8803-3996b45c6540 | books | {book_id=1} | Getting book
Where
-
2022-03-06T14:34:16GMT
is the date that the log took place. -
INFO
is the type of log. -
app.py
is where the file where the log got called. -
16d99360-90ad-4e1a-8803-3996b45c6540
is the unquie id associated with that user's flow. So any log with16d99360-90ad-4e1a-8803-3996b45c6540
, we can infer came from the same user. -
books
is the name of the function where the log got called. -
{book_id=1}
is the argument that was passed into the function. -
Getting book
is the message that we wanted to log.
We wanted all this information to be packed into our logs, with as little overhead or manual processing as possible.
Adding a Trace Id To Logs
from flask import Flask, g, has_request_context
import logging
import uuid
app = Flask(__name__)
logging.basicConfig(format="%(asctime)s | %(levelname)s | %(filename)s | %(trace_id)s | %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S%Z", level=logging.DEBUG)
@app.before_request
def before_request_func():
g.trace_id = uuid.uuid4()
old_factory = logging.getLogRecordFactory()
def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
if has_request_context():
record.trace_id = g.trace_id
else:
record.trace_id = uuid.uuid4()
return record
logging.setLogRecordFactory(record_factory)
book_db = {
0: "book_1",
1: "book_2"
}
def get_book(id):
logging.info(f"Getting book with id {id}")
return book_db[id]
@app.route("/books/<int:book_id>", methods=["GET"])
def books(book_id):
logging.info(f"Getting book")
book = get_book(book_id)
logging.info(f"Got book: {book}")
return {"books": [book]}
Now if we call the endpoint http://127.0.0.1:5000/books/1
we see in our logs an id associated with the log so we can clearly follow the logs of an user's flow:
2022-03-06T15:38:05GMT | INFO | app.py | 199934fe-a096-44e4-8780-31ed7a8550b6 | Getting book
2022-03-06T15:38:06GMT | INFO | app.py | 199934fe-a096-44e4-8780-31ed7a8550b6 | Getting book with id 1
2022-03-06T15:38:06GMT | INFO | app.py | 199934fe-a096-44e4-8780-31ed7a8550b6 | Got book: book_2
Adding Function Name & Arguments To Logs
Suppose we have an endpoint that calls multiple functions and those functions call other functions, but we want to keep track of which functions and arguments we should log be logging at that time. To solve this problem I drew it out so I could understand it better:
After drawing it out I noticed that we could use a stack to keep track of which function name and arguments we should be logging.
from flask import Flask, g, has_request_context
import logging
import uuid
app = Flask(__name__)
logging.basicConfig(format="%(asctime)s | %(levelname)s | %(filename)s | %(trace_id)s | %(func_name)s | %(args)s | %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S%Z", level=logging.DEBUG)
def log_function_name_and_args(func):
def inner(*args, **kwargs):
local_args = locals()
g.name_and_args.append((func.__qualname__, local_args))
logging.info("Function started")
output = func(*args, **kwargs)
logging.info(f"Function finished with output {output}")
g.name_and_args.pop()
return output
return inner
@app.before_request
def before_request_func():
g.name_and_args = []
g.trace_id = uuid.uuid4()
old_factory = logging.getLogRecordFactory()
def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
if has_request_context():
record.trace_id = g.trace_id
record.args = g.name_and_args[-1][1]
record.func_name = g.name_and_args[-1][0]
else:
record.trace_id = uuid.uuid4()
record.args = {}
record.func_name = ""
return record
logging.setLogRecordFactory(record_factory)
book_db = {
0: "book_1",
1: "book_2"
}
@log_function_name_and_args
def get_book(id):
logging.info(f"Getting book with id {id}")
return book_db[id]
@app.route("/books/<int:book_id>", methods=["GET"])
@log_function_name_and_args
def books(book_id):
logging.info(f"Getting book")
book = get_book(book_id)
logging.info(f"Got book: {book}")
return {"books": [book]}
Now if we call the endpoint http://127.0.0.1:5000/books/1
we see in our logs has the name and arguments of the function where the log originated from:
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | books | {'args': (), 'kwargs': {'book_id': 1}, 'func': <function books at 0x7fc357c21820>} | Function started
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | books | {'args': (), 'kwargs': {'book_id': 1}, 'func': <function books at 0x7fc357c21820>} | Getting book
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | get_book | {'args': (1,), 'kwargs': {}, 'func': <function get_book at 0x7fc357c21550>} | Function started
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | get_book | {'args': (1,), 'kwargs': {}, 'func': <function get_book at 0x7fc357c21550>} | Getting book with id 1
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | get_book | {'args': (1,), 'kwargs': {}, 'func': <function get_book at 0x7fc357c21550>} | Function finished with output book_2
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | books | {'args': (), 'kwargs': {'book_id': 1}, 'func': <function books at 0x7fc357c21820>} | Got book: book_2
2022-03-06T15:59:45GMT | INFO | app.py | 3698a090-eb05-4896-ae13-8134bb83e8fc | books | {'args': (), 'kwargs': {'book_id': 1}, 'func': <function books at 0x7fc357c21820>} | Function finished with output {'books': ['book_2']}
Wrapping Up!
We now have versatile logging that allows us to see all the logs that a user created when they hit an endpoint, as well as the name and arguments that the used by the function where the log originated from. This allows us to reproduce the user's flow and investigate a bugs in more detail.
One of the main disadvantages of this approach is that we have to use the wrapper log_function_name_and_args
on all the functions/methods so this information appears inside our logs. However in my opinion this outweighs the technical debt due to the wealth of knowledge that you gain.
If there is a way of doing this without the wrapper and applying it the whole application, please do get in touch as I would love to know!
Until next time,
Darrah 🙂
Top comments (0)