DEV Community

Prathamesh Sonpatki for Last9

Posted on • Originally published at last9.io on

Prometheus Alternatives

Prometheus Alternatives

Prometheus is a popular open-source platform for metrics and alerting created by SoundCloud in 2012 and officially released as open-source in 2015. Designed for both dynamic service-oriented architectures and system monitoring, Prometheus focuses on reliability, multidimensional data collection, and data visualization.

While Prometheus is an excellent option for tracking metrics, other open-source and SAAS alternatives in the ecosystem might better suit your needs.

This article compares Prometheus with InfluxDB, Zabbix, Datadog, and Graphite, Grafana based on their data model and storage, architecture, APIs and access methods, partitioning, compatible operating systems, pricing, visualization, alerting, and supported programming languages, use cases and supported workloads.

Prometheus Alternatives

The following is an overview of each tool compared in this article.

What is Prometheus?

As mentioned above, Prometheus is a monitoring and alerting system that helps developers manage applications, tools, databases, and even network monitoring. It has a comprehensive set of built-in features for collecting metric data and acts as a full-stack observability and monitoring system for microservices and cloud-native applications. It has merged with Cloud Native Computing Foundation(CNCF) since 2016 as the second most popular project after Kubernetes . While Prometheus is an excellent tool for DevOps and SRE teams, it can run into scalability issues where tools such as Thanos, Cortex, and Levitate can help.

InfluxDB

InfluxDB is a leading time series database that comes in three editions: an open-source version called InfluxDB and two commercial versions called InfluxDB Cloud and InfluxDB Enterprise. It provides a complete set of data tools for ingesting, processing, and manipulating multiple data points. It includes the InfluxDB user interface (InfluxDB UI) and Flux, a functional scripting and query language.

Zabbix

Zabbix is a scalable, accessible, open-source monitoring solution used for both small environments and enterprise-level distributed systems with millions of metrics.

Datadog

Datadog is a monitoring and analytics platform used for event monitoring and measuring the performance of cloud applications and infrastructure. It combines real-time metrics from disparate sources such as applications, servers, databases, and containers with end-to-end tracing to deliver alerts and visualizations. It can collect data from various data sources with its built-in integrations.

Graphite

Created by Chris Davis at Orbitz in 2006 and released as open source in 2008, Graphite is a monitoring solution that collects time series data from applications, servers, infrastructure, and networks. It focuses on storing passive time series data and analyzing it through the Graphite web UI.

Grafana

Grafana is a data visualization tool developed by Grafana Labs. It is available as open source, managed (Grafana Cloud), or enterprise edition. Grafana can combine data from many data sources into a single dashboard. It solves the problem of visualization of time series data.

Is Grafana the same as Prometheus?

We keep seeing this common question; while Prometheus is a time series database, Grafana is a data visualization tool. It supports Prometheus, Graphite, and InfluxDB as data sources. So they are not the same, but they work better together. Grafana is a standard for the visualization of Prometheus data.

Prometheus Alternatives in action

This section compares Prometheus to InfluxDB, Zabbix, Datadog, and Graphite using the following criteria:

  • Data model and storage
  • Architecture
  • APIs and access methods
  • Partitioning
  • Compatible operating systems
  • Supported programming languages
  • Open Source vs. Proprietary

Data Model and Storage

Prometheus captures and accumulates metric data as time series data and stores it in a local database. A metric name and optional key-value pairs are unique identifiers or labels for each time series.

Data can be queried in real-time using the Prometheus Query Language (PromQL) and presented in tabular or graphical form.

Prometheus supports the float64 data type with limited support for strings and millisecond resolution timestamps. Prometheus also supports long-term storage to different layers via Prometheus remote write protocol and can be run in an agent mode.

InfluxDB: Data Model and Storage

InfluxDB maintains a time series database optimized for time-stamped data, much like Prometheus. Data elements also comprise a unique combination of timestamps, tags, fields, and measurements. Tags are indexed key-value pairs used as labels, while fields are sequenced key-value pairs, which function as secondary labels with limited use.

InfluxDB uses a proprietary query language similar to SQL called InfluxQL and supports timestamp, float64, int64, string, and bool data types.

Zabbix: Data Model and Storage

Zabbix uses an external database to store the collected data and configuration information. It integrates with leading relational database management system (RDBMS) database engines such as MySQL, MariaDB, Oracle, PostgreSQL, IBM Db2, and SQLite, which allows Zabbix to store more complex data types such as system logs. Zabbix stores raw data collected from hosts in history tables, while trends tables store consolidated hourly data.

Datadog: Data Model and Storage

Datadog uses Kafka to process incoming data points and a mix of Redis, Cassandra, and S3 to store and query time series. It also uses Elasticsearch to store and query events (such as alerts and deployments) that are not represented as a time series and uses PostgreSQL for metadata.

Graphite: Data Model and Storage

Like Prometheus, Graphite stores time series data using its specialized database, but data collection is passive. Data is collected from collection daemons or other monitoring tools (including Prometheus) and sent to Graphite's Carbon component.

Summary

InfluxDB and Graphite both use time series databases similar to Prometheus. Graphite, however, doesn't store raw data as Prometheus does. InfluxDB offers full support for strings and timestamps as well as int64 and bool data types, while Prometheus only provides full support for float64. Zabbix integrates with more familiar RDBMS database engines and is suitable for storing historical data. At the same time, Datadog uses several data models and storage types to store both time-series and non-time-series data.

Architecture

Prometheus servers are standalone and run independently of each other. They rely on local on-disk storage rather than network or remote storage services for the core functionality of scraping, rule processing, and alerting. Data is stored for fourteen days, but Prometheus can be integrated with remote solutions such as Levitate for long-term storage.

InfluxDB: Architecture

Like Prometheus, open-source InfluxDB servers are standalone and use local storage for scraping, alerting, and rule processing. Commercial InfluxDB versions come with distributed storage by default that allows queries and storage to be managed by many nodes simultaneously, making it easier to perform horizontal scaling.

Zabbix: Architecture

Zabbix architecture comprises servers that store statistical, operational, and configuration data and agents installed on the machines that collect the data. Agents monitor and report data collected from local resources and applications to Zabbix servers.

Agents and servers support passive checks, where the server requests a value from the agent, and active checks, where the agent periodically sends results to the server.

Datadog: Architecture

Datadog uses Kafka for independent storage systems. It acts as a persistent storage and query layer. Kafka is an open-source, distributed, partitioned, replicated log service developed by LinkedIn as a unified platform for handling large-scale, real-time data feeds.

Graphite: Architecture

Graphite architecture is made up of three components:

  1. Carbon, the primary backend daemon that listens for time series data sent to Graphite and stores it in Whisper, the backend database
  2. Whisper, a fast, file-based local time series database that creates one file per stored metric
  3. The Graphite web UI, the frontend UI for the backend storage system that renders graphs on demand

Summary

While InfluxDB and Prometheus both use standalone servers, commercial versions of InfluxDB offer distributed storage to support horizontal scaling. The Zabbix architectural model uses servers with agents, which allows for both passive and active data checks. Datadog's use of Kafka for its persistent data storage layer will enable it to store large amounts of real-time data. Graphite's architecture includes a web app, which is a good choice if you want to render graphics on demand.

APIs and Access Methods

Prometheus uses RESTful HTTP endpoints with responses in JSON.

InfluxDB: APIs and Access Methods

The InfluxDB API provides a set of HTTP endpoints for accessing and managing system information, security and access control, resource access, data I/O, and other resources and returns JSON-formatted responses. The Enterprise version also provides support for TCP and UDP ports.

Zabbix: APIs and Access Methods

Zabbix uses the JSON-RPC 2.0 protocol. Requests and responses between clients and the API are encoded using JSON.

Datadog: APIs and Access Methods

Datadog uses the HTTP REST API. Resource-oriented URLs are used to call the API, with JSON being returned from all requests.

Graphite: APIs and Access Methods

Graphite data is queried over HTTP via its Metrics API or the Render URL API. The Graphite API is an alternative to the Graphite web UI that retrieves metrics from a time series database and renders graphs or generates JSON data based on these time series.

Summary

All tools provide support for HTTP requests and JSON-formatted responses.

Partitioning

Prometheus supports sharding. You can scale horizontally by splitting target metrics into shards on multiple Prometheus servers to create more minor instances.

InfluxDB: Partitioning

InfluxDB organizes data into shards to create a highly scalable approach that increases throughput and maintains performance as the data grows. Shards are placed into shard groups containing encoded and compressed time series data for a specific time range. The shard group duration defines the period for each shard group, and each group has a corresponding retention policy that applies to all the shards within the group.

Zabbix: Partitioning

Partitioning with Zabbix depends on the database being used. MySQL, PostgreSQL, IBM Db2, and MariaDB (with the Spider storage engine) offer sharding capabilities.

Datadog: Partitioning

Datadog uses Kafka partitions to scale by customer, metric, and tag set. You can isolate by the customer or scale concurrently by metric. Sharding is implemented as a group of Kafka partitions.

Graphite: Partitioning

Graphite does not support partitioning.

Summary

All tools except for Graphite offer some form of support for portioning. Prometheus, InfluxDB, and Datadog provide sharding and horizontal scaling features, while Zabbix support depends on your chosen external database.

Compatible Operating Systems

Prometheus supports the Linux and Windows operating systems.

InfluxDB: Compatible Operating Systems

InfluxDB supports Linux, Windows, and macOS.

Zabbix: Compatible Operating Systems

Zabbix supports Linux, Windows, macOS, IBM AIX, Solaris, and HP-UX operating systems.

Datadog: Compatible Operating Systems

Datadog supports Windows, Linux, and macOS operating systems and cloud service providers, including Google Cloud, AWS, Red Hat OpenShift, and Microsoft Azure.

Graphite: Compatible Operating Systems

Graphite supports Linux and Unix operating systems.

Summary

All tools except Graphite supports Windows and Linux operating systems; Graphite only supports Linux and Unix. InfluxDB, Zabbix, and Datadog also support macOS, with Datadog providing additional support for cloud service providers.

Supported Programming Languages

Prometheus provides several official and unofficial client libraries for .NET, C++, Go, Haskell, Java, JavaScript (Node.js), Python, and Ruby. It also supports Prometheus Exporters to collect data from systems that do not directly have client libraries.

InfluxDB: Supported Programming Languages

InfluxDB supports client libraries for C++, Java, JavaScript, .NET, Perl, PHP, and Python. It can be directly used with the REST API.

Zabbix: Supported Programming Languages

Zabbix supports Java, JavaScript, .NET, Perl, PHP, Python, R, Ruby, Elixir, Go, and Rust.

Datadog: Supported Programming Languages

Client libraries are available in C#/.NET, Java, Python, PHP, Go, Node.js, Ruby, and Swift, along with many integrations.

Graphite: Supported Programming Languages

Graphite has client libraries in Python and JavaScript (Node.js) programming languages.

Summary

Prometheus, InfluxDB, Zabbix, and Datadog all support the major programming languages. Graphite, however, only provides support for Python and JavaScript.

Comparison summary

Prometheus InfluxDB Zabbix Datadog Graphite Levitate
Data Model and Storage Multi-dimensional data model with Time series data Time series data External database stores including RDBMS Both time series and non time series data Time series data PromQL compatible time series data
API and Access methods HTTP API HTTP API HTTP API HTTP API HTTP API HTTP API
Partitioning Supported Supported Supported, depends on RDBMS of choice Supported Supported Managed TSDB
Open Source Yes Yes. Proprietary also available. Yes No. Proprietary Yes No. Proprietary
Programming languages Tons of client libraries and exporters C++, Java, JavaScript, .NET, Perl, PHP, and Python. Java, JavaScript, .NET, Perl, PHP, Python, R, Ruby, Elixir, Go, and Rust Tons of integrations Python and JavaScript (Node.js) It can be directly used with the REST API

Prometheus's strengths lie in its support for multidimensional data collection. It has a powerful query language that can be used for both dynamic service-oriented architectures and machine-centric monitoring. It's a good choice when you primarily want to record numeric time series.

InfluxDB and Prometheus use similar data compression techniques and support multidimensional data using key-value data stores; InfluxDB is better for event logging. A commercial version provides the best option if you need to process large amounts of data, as its default configuration scales horizontally.

Zabbix focuses on hardware and device management and monitoring. It's a better option than Prometheus if you are more familiar with RDBMS database engines and need to store many historical and varied data types. However, the use of an external database can slow down performance.

Prometheus's internal time series database provides faster connectivity to data but is not suitable for storing data types like text or event logs. Since Prometheus only keeps data for fourteen days, it's also not a good option if you need to store historical data (unless configured for remote storage).

Datadog and Prometheus can be used for application performance monitoring(APM). However, Datadog has more application monitoring capabilities than Prometheus and is geared toward monitoring infrastructure at scale. Datadog is best for monitoring infrastructure and apps and visualizing data from disparate sources in mid to large-scale environments.

Graphite runs well on all hardware and cloud infrastructure, making it suitable for small businesses with limited resources and large-scale production environments. Choose Graphite when you need a solution focused on storing and analyzing historical data and fast retrieval.

Conclusion

Prometheus is a popular option for tracking metrics and alerting, but one of the four alternatives mentioned above might suit your needs depending on your requirements.

For processing large amounts of data, choose a commercial version of InfluxDB, but if you want the familiarity of an RDBMS engine, then go with Zabbix. Datadog's wide range of monitoring features makes it the go-to choice for monitoring infrastructure in larger environments. Still, if you operate on a smaller scale, Graphite can get the job done with whatever hardware and resources you have.

Last9, a site reliability engineering (SRE) platform. We remove the guesswork in improving the reliability of your distributed systems. Last9's Levitate, a managed time series database(TSDB), helps you understand, track, and improve your organization's system dependencies to reduce the challenges of time series database management.

Access the intelligence you need to deliver reliable software with Last9's reliability platform.


This post was originally published on Last9 Blog.

Top comments (0)