While Prometheus itself does not offer a clustered storage solution to store data across multiple machines out of the box, there are a number of so called Long-Term Storage (LTS) options available. In this article we do a high-level review of some the LTS options that were the topic of a PromCon 2018 panel I had the pleasure to witness in person, back in August of this year:
Note that below I won't provide a recommendation which LTS solution you should pick, since it very much depends on your specific requirements and preferences.
Prometheus stores data on local storage, which limits the data you can query or otherwise process to the most recent days or weeks, depending on how much space you have available, locally. If you, however, have the need to retain data for longer time periods, for example for long-term capacity planning, analyzing usage trends, or for regulatory reasons in a certain vertical—think: financial domain or health care—then you likely benefit from a LTS solution. Let's have a look at the offerings discussed in the PromCon panel (in alphabetical order):
Cortex provides horizontally scalable, multi-tenant, long term storage for Prometheus metrics when used as a remote write destination, and a horizontally scalable, Prometheus-compatible query API.
- Cortex: open-source, horizontally-scalable, distributed Prometheus, 06/2017
- Project Frankenstein: Multitenant, Scale-Out Prometheus: video | slides, 09/2016
InfluxDB is a time series database designed to handle high write and query loads and is meant to be used as a backing store for any use case involving large amounts of timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics. InfluxDB supports the Prometheus remote read and write API.
Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity. It can be added seamlessly on top of existing Prometheus deployments and leverages the Prometheus 2.0 storage format to cost-efficiently store historical metric data in any object storage while retaining fast query latencies. Additionally, it provides a global query view across all Prometheus installations and can merge data from Prometheus HA pairs on the fly.
So to sum up: while Prometheus itself does not support long-term retention of the time series data of interest, there are a number of solutions you can choose from to keep the metrics around for as long as needed. Hope this quick review gives you an idea of some of the available options and can serve as the basis for your own research, should you find yourself in the a situation to have to select one. I wish you successful monitoring and please do share your findings and/or hands-on experiences with above discussed or other LTS solutions not covered here.
Cover image kudos to jesse orrico via Unsplash.