A front cover showing a CPU processor.

Structured Logging with Structlog

An introduction to structured logging and its advantages using the structlog package for Python.

Technologies
Logs Explorer
Python
structlog

Introduction

Logs can be of incredible value when you are confronted with an error somewhere in your system. Especially when you are trying to locate and resolve a critical bug! Given that there is a log that describes the error, it is nice when it contains a succinct message, and some context in the form of values of variables, tracebacks of exceptions, etcetera. In other words: logs should enable you to quickly find the information you need. In this post, we will take a look at how structured logging can help to accomplish this, along with structlog, a Python package for logging.

What is Structured Logging

So, what is structured logging exactly? Put simply, it is formatting your logs such that they are not only easy to parse by humans, but also computers. This allows us to extract information from the logs more easily, without using complex regular expressions that are impossible to understand, or maintain. Some areas where this can be of benefit are: readability, searchability, and consistency. Logs as data, not mere text.

How do we make logs structured? By using a certain data structure and serializing them to a specific format. Using JSON is commonly considered best practice, and we will be using it as well. For the data structure, structlog will help us out.

Structured Logging with structlog

To get started, we need to install structlog in our environment:

1
$ python -m pip install structlog

Next, we start a Python interpreter and import it, after which we can log a message, including some context:

1
2
3
4
>>> import structlog
>>> logger = structlog.getLogger()
>>> logger.info("Hello, world!", key="value!", more_than_strings=[1, 2, 3])
2022-12-16 11:15:57 [info     ] Hello, world!       key=value! more_than_strings=[1, 2, 3]

As can be seen, it is quite straightforward to use structlog instead of the logging module from the standard library. Just import structlog instead of logging and replace logger = logging.getLogger() with logger = structlog.getLogger().

Noteworthy is that the keyword arguments work in the same manner as a regular dictionary. As a matter of fact, under the hood it is a regular dictionary. So, if you want to log any variables together with your message, simply pass them as keyword arguments, and they will appear in the output as shown above.

Note: if you prefer to not install third-party dependencies, there is a recipe in the Python logging cookbook to implement structured logging using only the standard library.

Configuration

What we have talked about thus far is not structured logging, and before we get around to it, we must briefly discuss the notion of structlog processors.

A log processor is a regular callable, i.e. a function or an instance of a class with a call() method.

Log processors can be composed into a processor chain, and add, for instance, a timestamp, the log level, or in our case a JSON renderer:

1
2
3
4
5
6
>>> structlog.configure(
...     processors=[
...             structlog.processors.JSONRenderer(),
...             structlog.processors.dict_tracebacks,
...     ],
... )

There are lots of available processors, and lots of configuration options. Check the documentation to see what is possible.

Now that we have properly configured structlog for JSON formatted logs, we see that running the above log statement again gives the desired result:

1
2
>>> logger.info("Hello, world!", key="value!", more_than_strings=[1, 2, 3])
{"key": "value!", "more_than_strings": [1, 2, 3], "event": "Hello, world!"}

By also using structlog.processors.dict_tracebacks, we make sure exceptions also display using JSON:

1
2
3
4
5
6
>>> try:
...     1 / 0
... except ZeroDivisionError:
...     logger.exception("Cannot divide by zero!")
...
{"event": "Cannot divide by zero!", "exception": [{"exc_type": "ZeroDivisionError", "exc_value": "division by zero", "syntax_error": null, "is_cause": false, "frames": ...}]}

Beware that only log statements from the application itself will use JSON, not those from dependencies. To output logs from installed packages using JSON as well, you can take a look at python-json-logger, and additionally read up on how to set it up with structlog here.

Development & Production Setup

While having JSON logs in your production setup is nice, you might prefer regular logs for development. For example, compare our development setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> structlog.configure(
...     processors=[
...             structlog.processors.add_log_level,
...             structlog.processors.format_exc_info,
...             structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S", utc=True),
...             structlog.dev.ConsoleRenderer(),
...     ],
...     wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
...     cache_logger_on_first_use=True,
... )

to our production setup, for which we use python-json-logger as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
>>> structlog.configure(
...     processors=[
...             structlog.stdlib.filter_by_level,
...             structlog.stdlib.add_logger_name,
...             structlog.stdlib.add_log_level,
...             structlog.stdlib.PositionalArgumentsFormatter(),
...             structlog.processors.StackInfoRenderer(),
...             structlog.processors.format_exc_info,
...             structlog.processors.UnicodeDecoder(),
...             structlog.stdlib.render_to_log_kwargs,
...     ],
...     logger_factory=structlog.stdlib.LoggerFactory(),
...     wrapper_class=structlog.stdlib.BoundLogger,
...     cache_logger_on_first_use=True,
... )

Another option is to use the same configuration for development as for production to reduce complexity; whichever way you think works best for your use case. The takeaway here is, that by changing the used processors in the processor chain, it is fairly easy to change the way logs are processed, and displayed.

To hook up python-json-logger with the logging code, we set the formatter in a logging.conf file. This file can be used to configure logging on application startup, ideally before any log is written. A minimal example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[loggers]
keys=root

[handlers]
keys=consoleHandler

[formatters]
keys=jsonFormatter

[logger_root]
level=INFO
handlers=consoleHandler

[handler_consoleHandler]
class=StreamHandler
formatter=jsonFormatter

[formatter_jsonFormatter]
class=pythonjsonlogger.jsonlogger.JsonFormatter

You should configure the logging module before configuring structlog:

1
>>> logging.config.fileConfig("logging.conf", disable_existing_loggers=False)

Structured Logging & Google Cloud Platform

Thus, given all of the above, how does structured logging help us in a real world scenario? Let us take a look at an example.

One of our applications, which is a web app, writes its logs to stdout. It is containerized and resides in a Google Kubernetes Engine cluster. Without any further setup, these log statements are automatically picked up by Google Logs Explorer, as shown below:

Logs Explorer

Each line contains only the log statement its message, which makes the log as a whole fairly easy to read. Everything else we might have provided for context using structlog, is put into the jsonPayload field. When we want to see more details from a specific log statement, we simply expand it:

Expanded Logs Explorer

Furthermore, if we are interested in all the log statements that contain a specific field, let us say the message_id, we can click on its value in the jsonPayload, and add it to the summary line:

Tagged Logs Explorer

Now all the log statements that contain the same message_id stand out.

Conclusion

We have seen how structured logging can make your logs expressive and powerful. Specifically, how they become better readable, better searchable, and more consistent overall. Additionally, we saw this can be achieved with relatively minimal setup using a real-world scenario. If you are not using structured logging yet, you are missing out on a better development experience.

About the author

Maximilian Filtenborg

Maximilian is a machine learning enthusiast, experienced software engineer, and co-founder of BiteStreams. In his free-time he listens to electronic music and is into photography.

Read more

Continue Reading

Enjoyed reading this post? Check out our other articles.

Do you want to get more insight into your Data? Contact us now

Get more data-driven with BiteStreams, and leave the competition behind you.

Contact us